davidtvs / PyTorch-ENet

PyTorch implementation of ENet
MIT License
383 stars 129 forks source link

Details in training on Camvid #12

Closed herleeyandi closed 5 years ago

herleeyandi commented 5 years ago

Hello @davidtvs thank you for your works. I have a question about camvid training. I try to train from scratch using Camvid dataset which follow this division for training and testing. I evaluate the validation data and I just got mIOU about 31% in epoch 1000. Its using the same size as you mention in your readme. Do you also got the same problem?. Using your implementation I follow this setting: 1) 11 classes, the unlabelled will belongs to 0, all of the other class outside of that 11 class will be belongs to class background or 0. 2) The class roadmarking is not used!, I have check also in another implementation they don't use it. 3) I used ENet initialization. So in your experiment, did you got 31%accuracy in validation data at epoch more than 500?

davidtvs commented 5 years ago

That dataset is the same I used. What hyperparameters are you using? I got 48.7 mIoU in 300 epochs with the input arguments found here

herleeyandi commented 5 years ago

Hello I am sorry for late reply. I just got the same result for validation. Here is the result for validation:

sky: 0.9261
building: 0.8089
pole: 0.0865
road: 0.9575
pavement: 0.8334
tree: 0.8479
sign_symbol: 0.2888
fence: 0.5943
car: 0.5909
pedestrian: 0.2836
bicyclist: 0.5117
unlabeled: nan
BEST VALIDATION
Epoch: 200
Mean IoU: 0.6181304868089039

However after doing testing I got this:

Testing...
>>>> Running test dataset
>>>> Avg. loss: 1.2392 | Mean IoU: 0.4967
sky: 0.8901
building: 0.6573
pole: 0.1985
road: 0.8905
pavement: 0.6671
tree: 0.6169
sign_symbol: 0.1632
fence: 0.1574
car: 0.6452
pedestrian: 0.2406
bicyclist: 0.3372
unlabeled: nan

image Its little bit lower than the paper. I tried to retrain in more epoch but can't reach accuracy in paper. I have several questions as follows. 1) How to boost the accuracy?, based on your experience what is the best way to do for segmentation to boost the accuracy. 2) Why the RoadMarking class ignored?, If we see in your code or segnet code ignore the RoadMarkings class. I have tried before and seems like the accuracy is so poor, do you get the good accuracy in RoadMarkings class? 3) The unlabelled class is NaN why?, is that because we ignored the unlabelled class and seems like we only have 11 classes in total instead of 12 classes? 4) How about if we tried binary class?, supposed I want to detect only the RoadMarkings class, can I ignored all other class then finally I only have 1 class instead of 2 classes like in unlabelled case?

Sorry if I asking so many things since I am a beginner in this topic. All help, suggestion, and experience sharing is appreciated. Thank you so much for your help.

davidtvs commented 5 years ago

That's a pretty good score, the closest I've seen to the paper with this implementation. Regarding the questions:

  1. Boost the accuracy of ENet? Or are you willing to try other architectures/other datasets?
  2. It became the standard way to compare models on CamVid. It actually has 32 classes, many of them are rare and the dataset is also rather small; so the 11 most frequent classes became the standard way to compare model performance. The first mention of this 11 class CamVid dataset is on the CamVid paper itself. I remember doing some tests with RoadMarkings (12 classes in total) and it was pretty bad but given the rarity of the class that's expected.
  3. This is a convention, unlabeled pixels are ignored when computing metrics. You can make it take the unlabeled class into account by specifying --with-unlabeled as a command-line argument.
  4. You can, just be careful with the fact that the problem changes from multi-class to binary class and quite a number of things will need to be changed because of that, have a look at #10.
herleeyandi commented 5 years ago

1) @davidtvs Thank you so much for your explanation. Actually I am willing to boost the accuracy on ENet. Now I know why in the first time I got the bad result. The main problem is in your code I don't see model.train(True) in training phase andmodel.train(False) in the testing phase. After do this, doing 300 epoch I got 0,51 in IoU almost same with the paper (0.513). However in inference if you use model.eval() the accuracy will be so bad. I just realize that the model use nn.Dropout2d which according to my experience its very dangerous but it used in the paper, I don't know whether it is the pytorch bug or something, since after applying model.train(False) in testing phase, model.eval() should be OK. 2) Another problem that still in my head is in which part of your code is converting RGB pixel to class?, I believe if we say that outer than the color_encoding should be include as void class, but in your cases the unlabeled is [0,0,0], why it only [0,0,0] how about when we ignore the RoadMarking it will belong into which class? 3) I have try to compare the result with the GT. Here is the result, do you think that is the normal cases in the camvid dataset since it has a few amount of training data?, What is the metrics mean?, is that every class IoU or the pixel accuracy? image 4) If I comparing with ENet paper, I got this comparison. However in here we can't see the Black color, maybe its belong to void class, that's why I am asking number 2 question whether we can grouped the void class. Or maybe [0,0,0] is the background class and other is neglected. image

davidtvs commented 5 years ago
  1. After looking at the code I think you are right, there should be a model.eval() in the testing class and model.train() on the training class. I was already thinking of migrating the code to PyToprch 1.0 in the next few days, so I'll do some more testing on this at that time.
  2. The class responsible for the conversion is the LongTensorToRGBPIL transformation using this dictionary. The 11 class dataset from the SegNet repository merges the road markings with the road. The road_marking class key is removed from the dictionary here. The reason why the dictionary starts with the road_marking key is that I was initially testing with a 12 class version of the dataset that included the road_marking.
  3. Those results look normal for ENet. The metrics are the same as the ones found in the paper: class average, mean IoU of all classes, and IoU for each class.
  4. The unlabeled class is never predicted because the weight given to the loss function for that class is 0. The reason for that is that those pixels will be ignored regardless of what the network predicts (again this is the convention used for comparing the performance of the networks). The authors probably decided to use a weight of 0 for the unlabeled class to create a more visually appealing image, but other papers (like the one from SegNet) don't do the same.
herleeyandi commented 5 years ago

HI @davidtvs I have tried to migrating it. Just change loss.data[0] with loss.item(). I have tried to put model.train(True) and model.Eval() like this. In train.py

def run_epoch(self, iteration_loss=False):
        epoch_loss = 0.0
        self.metric.reset()
        self.model.train(True)
        for step, batch_data in enumerate(self.data_loader):

In test.py

def run_epoch(self, iteration_loss=False):
        epoch_loss = 0.0
        self.metric.reset()
        self.model.eval()
        for step, batch_data in enumerate(self.data_loader):

However the result is kinda weird. Here is when using model.eval() in the inference. image

Here is result without model.eval() image

Its very weird!, I have trained it with model.eval() in the test class but it can't provide better result in the inference. I think this is because the droupout layer.

Once again in this kind of information it is the Class IoU or the Class pixel accuracy?

sky: 0.9373
building: 0.8372
pole: 0.0744
road: 0.9620
pavement: 0.8400
tree: 0.8902
sign_symbol: 0.3059
fence: 0.6632
car: 0.6930
pedestrian: 0.3095
bicyclist: 0.5919
unlabeled: nan
davidtvs commented 5 years ago

That's how I would do it, but the results are indeed odd. As soon as I have some time I'll try to make the migration and see if I get something different. If you end up figuring it out let me know or submit a PR.

That's the IoU of each class.

davidtvs commented 5 years ago

The migration to PyTorch 1.0 is done. Also fixed some bugs including the missing .train() and .eval(), thanks for the heads up.

I trained on CamVid and Cityscapes from scratch and got improved results (around 4% better mean IoU). I also did not see the weird results during inference with .eval() as you can see below:

camvid_new image | ground-truth | prediction