fvisin / reseg

A Recurrent Neural Network for Object Segmentation
GNU General Public License v3.0
125 stars 35 forks source link

55% mean IoU on CamVid #2

Closed qianguih closed 8 years ago

qianguih commented 8 years ago

Hi, there, Thanks for sharing the amazing codes! I tried to reproduce the experimental results in your paper. But I only get roughly 55% mean IoU by running your codes without changing anything. Is there any specific things I need to take care to reproduce the 58.8% mean IoU? Are there any specific tricks like data augmentation or so? Thanks!

fvisin commented 8 years ago

Hi Qiangui,

it's been a while since last time I ran this code, but IIRC it shouldn't require any modification to reproduce the results in the paper and we didn't use any kind of data augmentation. Is it possible that you didn't resize the images correctly? When you resize the masks make sure to use a nearest neighbour interpolation, or you might end up introducing artifacts.

Not really a convincing explanation, but off the top of my head you might want to run the code twice and see how it performs. Convolutions are not entirely deterministic in Theano (unless you force them to be, which makes them much slower) so some degree of difference between different runs has to be expected. A 3.8% drop is way too much to be justified by this though. Are you training in float32? On which GPU? In some cases I noticed different performances on different architectures. Again a 3.8% drop seems way too much to be justified by something like this, but it might be a mix of factors.

I noticed that the CamVid website changed since last time I opened it. Is it possible they updated the dataset adding new data? IIRC the whole dataset should be around 700 images.

I am sorry but I don't have much time to investigate as many deadlines are coming and I don't have a spare GPU to test the code again. I hope what I wrote can be of help though.

qianguih commented 8 years ago

Hi Francesco, Thanks for your reply!

Actually I didn't resize the images. I downloaded them from segnet repo. Images there have already been resized. I have run the codes on K20 and Titan X. They both produced similar results. Does theano version and lasagne version matter here? I am using theano of version 0.9.0 and lasagne 0.2.0. What version did you use?

BTW, I am a little confused about the void class. In your scripts, you created a flag has_void_class for CamVid dataset. But I noticed during training the void class is considered and during testing the void class is not considered. Would it be helpful if we mask out void class during training? And in the experimental setting, the last class is treated as void class in CamVid, right?

Really thank you for your time on my problems. : )

qianguih commented 8 years ago

BTW, I noticed in the figure 2 in your paper, there are three groups of ReNets. But only two groups of ReNets are used in your code, could this be a reason for the low performance?

qianguih commented 8 years ago

Hi Francesco, I figured it out. The performance drop is due to the choice of solver. The default one in your code is adadelta. Once it is changed to adam, mean IoU near 58% can be achieved.

fvisin commented 8 years ago

Hi Qiangui,

I am happy you reproduced the results of the paper!

Just for clarity, I remember re-running the eval files before pushing them to the repo to make sure they reproduced the results of the paper, so they should not need any modification. I suggest you double check you resized the images of the dataset as specified in the paper. Anyhow the important thing is that the problem is solved! I am closing the issue, but feel free to comment further if you need to add something.