Tramac / awesome-semantic-segmentation-pytorch

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)
Apache License 2.0
2.85k stars 582 forks source link

Trend of mIoU #90

Open RAYRAYRAYRita opened 4 years ago

RAYRAYRAYRita commented 4 years ago

Hello~ Thanks for your work! And there is something making me puzzled. I trained icnet on cityscapes for many times with various config. And I visualized val mIoU after each epoch during training. I see mIoU trends to raise. However even if I set epoch from 50 to 120, the line of mIoU is always like this, looking like it's going to raise. Does it imply a bigger epoch is needed? 2019-10-23 15-10-06屏幕截图

In addition, I don't know why at the beginning of train or eval, there is 50% chance that my computer crashes..... Could you please do me a favour. Thx in advance!

sainatarajan commented 4 years ago
  1. Try running the cityscapes dataset on a different model such as BiSeNet or DeepLab or PSPNet. I am sure you will get good results.

  2. There are different reasons as to why your PC crashes. It can be because either you don't have the sufficient GPU memory or there are some version issues with CUDA/CuDNN. What is your GPU? One fix is to run nvidia-smi in shell prompt and kill all those processes that use GPU memory (be careful with this. You might accidentally kill system-related processes) before starting your training. In case you are running the model from jupyter notebook, try restarting the jupyter kernel every time you run or another thing that you can do is to reduce the batch_size and start the training.

RAYRAYRAYRita commented 4 years ago

@sainatarajan Thanks for your reply!

  1. Yes, I think you are right. I need a real time segmentation so maybe my next step is to try BiSeNet.
  2. My GPU is GeForce GTX 1080Ti. Actually I was running nvidia-smi before and during my training to monitor GPU memory. I found at the beginning of train/eval, usage of memory would grow intensely (it's exactly when my computer has 50% chance to crash. Sometimes it passed and sometimes it failed. Just one time reported 'out of memory') and then go down to the normal level. So, in order to keep memory sufficient at the beginning, I had to reduce batch_size or crop_size even though memory was enough during the training. But I'm afraid this would affect results. Thx again for your reply~