Closed herleeyandi closed 5 years ago
That dataset is the same I used. What hyperparameters are you using? I got 48.7 mIoU in 300 epochs with the input arguments found here
Hello I am sorry for late reply. I just got the same result for validation. Here is the result for validation:
sky: 0.9261
building: 0.8089
pole: 0.0865
road: 0.9575
pavement: 0.8334
tree: 0.8479
sign_symbol: 0.2888
fence: 0.5943
car: 0.5909
pedestrian: 0.2836
bicyclist: 0.5117
unlabeled: nan
BEST VALIDATION
Epoch: 200
Mean IoU: 0.6181304868089039
However after doing testing I got this:
Testing...
>>>> Running test dataset
>>>> Avg. loss: 1.2392 | Mean IoU: 0.4967
sky: 0.8901
building: 0.6573
pole: 0.1985
road: 0.8905
pavement: 0.6671
tree: 0.6169
sign_symbol: 0.1632
fence: 0.1574
car: 0.6452
pedestrian: 0.2406
bicyclist: 0.3372
unlabeled: nan
Its little bit lower than the paper. I tried to retrain in more epoch but can't reach accuracy in paper. I have several questions as follows. 1) How to boost the accuracy?, based on your experience what is the best way to do for segmentation to boost the accuracy. 2) Why the RoadMarking class ignored?, If we see in your code or segnet code ignore the RoadMarkings class. I have tried before and seems like the accuracy is so poor, do you get the good accuracy in RoadMarkings class? 3) The unlabelled class is NaN why?, is that because we ignored the unlabelled class and seems like we only have 11 classes in total instead of 12 classes? 4) How about if we tried binary class?, supposed I want to detect only the RoadMarkings class, can I ignored all other class then finally I only have 1 class instead of 2 classes like in unlabelled case?
Sorry if I asking so many things since I am a beginner in this topic. All help, suggestion, and experience sharing is appreciated. Thank you so much for your help.
That's a pretty good score, the closest I've seen to the paper with this implementation. Regarding the questions:
--with-unlabeled
as a command-line argument.1) @davidtvs Thank you so much for your explanation. Actually I am willing to boost the accuracy on ENet. Now I know why in the first time I got the bad result. The main problem is in your code I don't see model.train(True)
in training phase andmodel.train(False)
in the testing phase. After do this, doing 300 epoch I got 0,51 in IoU almost same with the paper (0.513). However in inference if you use model.eval()
the accuracy will be so bad. I just realize that the model use nn.Dropout2d
which according to my experience its very dangerous but it used in the paper, I don't know whether it is the pytorch bug or something, since after applying model.train(False)
in testing phase, model.eval()
should be OK.
2) Another problem that still in my head is in which part of your code is converting RGB pixel to class?, I believe if we say that outer than the color_encoding should be include as void class, but in your cases the unlabeled is [0,0,0], why it only [0,0,0] how about when we ignore the RoadMarking it will belong into which class?
3) I have try to compare the result with the GT. Here is the result, do you think that is the normal cases in the camvid dataset since it has a few amount of training data?, What is the metrics mean?, is that every class IoU or the pixel accuracy?
4) If I comparing with ENet paper, I got this comparison. However in here we can't see the Black color, maybe its belong to void class, that's why I am asking number 2 question whether we can grouped the void class. Or maybe [0,0,0] is the background class and other is neglected.
model.eval()
in the testing class and model.train()
on the training class. I was already thinking of migrating the code to PyToprch 1.0 in the next few days, so I'll do some more testing on this at that time.LongTensorToRGBPIL
transformation using this dictionary. The 11 class dataset from the SegNet repository merges the road markings with the road. The road_marking
class key is removed from the dictionary here. The reason why the dictionary starts with the road_marking
key is that I was initially testing with a 12 class version of the dataset that included the road_marking
.HI @davidtvs I have tried to migrating it. Just change loss.data[0] with loss.item(). I have tried to put model.train(True)
and model.Eval()
like this.
In train.py
def run_epoch(self, iteration_loss=False):
epoch_loss = 0.0
self.metric.reset()
self.model.train(True)
for step, batch_data in enumerate(self.data_loader):
In test.py
def run_epoch(self, iteration_loss=False):
epoch_loss = 0.0
self.metric.reset()
self.model.eval()
for step, batch_data in enumerate(self.data_loader):
However the result is kinda weird. Here is when using model.eval()
in the inference.
Here is result without model.eval()
Its very weird!, I have trained it with model.eval()
in the test class but it can't provide better result in the inference. I think this is because the droupout layer.
Once again in this kind of information it is the Class IoU
or the Class pixel accuracy
?
sky: 0.9373
building: 0.8372
pole: 0.0744
road: 0.9620
pavement: 0.8400
tree: 0.8902
sign_symbol: 0.3059
fence: 0.6632
car: 0.6930
pedestrian: 0.3095
bicyclist: 0.5919
unlabeled: nan
That's how I would do it, but the results are indeed odd. As soon as I have some time I'll try to make the migration and see if I get something different. If you end up figuring it out let me know or submit a PR.
That's the IoU of each class.
The migration to PyTorch 1.0 is done. Also fixed some bugs including the missing .train()
and .eval()
, thanks for the heads up.
I trained on CamVid and Cityscapes from scratch and got improved results (around 4% better mean IoU). I also did not see the weird results during inference with .eval()
as you can see below:
image | ground-truth | prediction
Hello @davidtvs thank you for your works. I have a question about camvid training. I try to train from scratch using Camvid dataset which follow this division for training and testing. I evaluate the validation data and I just got mIOU about 31% in epoch 1000. Its using the same size as you mention in your readme. Do you also got the same problem?. Using your implementation I follow this setting: 1) 11 classes, the unlabelled will belongs to 0, all of the other class outside of that 11 class will be belongs to class background or 0. 2) The class roadmarking is not used!, I have check also in another implementation they don't use it. 3) I used ENet initialization. So in your experiment, did you got 31%accuracy in validation data at epoch more than 500?