Performance decreases during evaluation

Haochen-Wang409 / U2PL

[CVPR'22 & IJCV'24] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels & Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

Apache License 2.0

436 stars 61 forks source link

Performance decreases during evaluation #145

Closed tanveer6715 closed 1 year ago

tanveer6715 commented 1 year ago

I have trained U2PL for my custom dataset which have only 2 classes. However when I evaluate the model performance using eval.sh by using the best and latest saved checkpoints, the model always shows 2% or 3% less mIoU as it showed in the training evaluation. I evaluate the model using both technique sliding window as well as simple. However when using sliding window evaluation it mIoU further decreases. Will you guide me what could be the possible reason?

Haochen-Wang409 commented 1 year ago

The evaluation configuration is not exactly the same across these settings:

During training, validation samples are center-cropped.
During eval.sh, the validation samples are simply resized.

However, it is a little bit strange that the mIoU decreases when using sliding window evaluation. We did not observe this using VOC or Cityscapes.

tanveer6715 commented 1 year ago

Thanks for your response. For fair evaluation is it possible to process whole image without resizing in eval.sh?

Haochen-Wang409 commented 1 year ago

It is relatively hard. A common practice is to first resize the image into a fixed resolution (e.g., 513x513), and then resize the segmentation map to the original size.

tanveer6715 commented 1 year ago

okay. Then as you said in other issues that you did not use eval.sh for pascal dataset. You have reported the mIoU during training evaluation in the paper. It means that mIoU of pascal was obtained by center cropping evaluation after each epoch during training the model right?

Haochen-Wang409 commented 1 year ago

Yes, this is because some previous works do so (e.g., AEL).

tanveer6715 commented 1 year ago

Ok thank you. sorry to bother you again. Can you tell me in cityscapes dataset which was best mIoU you have obtained during experiments.

By training evaluation center cropping?
By eval.sh of sliding window?
By eval.sh of simple resizing the image?

Haochen-Wang409 commented 1 year ago

The second one is the best. And usually, the second one performs ~3% mIoU than the others.

tanveer6715 commented 1 year ago

Okay thank you. As stated in my case its opposite. It may be depend on the dataset nature as I obtained best mIoU by training evaluation center cropping. My datasets have cracks and background classes which is highly unbalanced by classwise.

Haochen-Wang409 commented 1 year ago

Yes, I guess it should be the discrepancies between different datasets.

tanveer6715 commented 1 year ago

Thank you for your responses. I am closing this issue now :)

Haochen-Wang409 commented 1 year ago

Please do not hesitate to let me know if you have further questions :)

tanveer6715 commented 1 year ago

Please do not hesitate to let me know if you have further questions :)

Yes sure I will. Thank you again :)