Closed Haochen-Wang409 closed 3 years ago
Hi Haochen, thank you for your interest in our work.
The evaluation bug is reported by the authors of GCT. Briefly speaking, in the GCT original implementation and reported performance, the authors did a CenterCrop
operation on testing images during inference which indeed shouldn't be conducted. The right practice is to predict and evaluate each testing image on its original resolution.
You may refer to the second Note at this link for more details.
Thanks for your explaination! If we must evaluate each testing image on the original shape, the batch size have to be 1 when validate. Am I right?
Yes, you are right! :)
Thanks a lot!
I found CenterCrop
was uesd in Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision and Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation, which was published in CVPR 2021 and ICCV 2021 respectively. So i am quite confused about whether to use CenterCrop
in evaluation.
From my perspective, it is more practical to evaluate on original resolution in semantic segmentation.
Hi, I'm a beginner in semantic segmentation, and thanks for your great work. It seems ST++ is the only method which select `reliable DT mask' for unlabeled images.
I'm interested in the `evaluation bug' metioned on Table.1 Could you give me a brief introduction about it?