About ohem loss - Githubissues

charlesCXK / TorchSemiSeg

[CVPR 2021] CPS: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

MIT License

504 stars 74 forks source link

About ohem loss #17

Closed jinhuan-hit closed 3 years ago

jinhuan-hit commented 3 years ago

Hi, Xiaokang, Thanks for sharing so solid work! I noticed that the loss in supervised learning is ohem loss. Have you ever done the experiments for ce loss and how about the result?

charlesCXK commented 3 years ago

Hi, for supervised training, we use OHEM loss on CityScapes and CE loss on VOC dataset, which is a common setting in Semantic Segmentation. We haven't tried CE loss on CityScapes for supervised training.

For CPS loss, we use CE loss for both the two datasets.

jinhuan-hit commented 3 years ago

In my opinion, for semi-supervised training, all methods use CE loss for both the two datasets. The baseline of Deeplab v3+ with ResNet-101 on 1/8 cityscapes setting is 72-73 in your paper. However, in A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation, the result is only 68.9.

charlesCXK commented 3 years ago

Hi, some researchers in semi-supervised segmentation area may like to use CE loss for CityScapes (as you mentioned). However, OHEM is a common setting in sueprvised training setting on CityScapes. Since it brings no computational cost/parameters during inference, why do we deliberately use a lower baseline (e.g. with CE loss)?

I also want to clarify two points:

When compared with SOTA on Cityscapes, we use OHEM loss (on labeled set) for all the methods so the comparison is fair (their supervised baseline is exactly the same one).
If the baseline for semi-supervised learning is very low, the gain may be large and seems that the semi-supervised method have a very large impact on the performance. However, it that true? I think, we study semi-supervised learning in order to use unlabeled data to improve the performance of the model, not to see large gains on a low baseline.

charlesCXK commented 3 years ago

I think if the supervised baseline is not trained well enough, then we cannot tell where the gain brought by semi-supervised learning actually comes from.

jinhuan-hit commented 3 years ago

Yeah, I agree with you that studying semi-supervised learning on a higher baseline. I'm sorry for that I havn't noticed that you reproduce all the SOTA methods by yourself. Maybe you can point it on the benchmark, https://paperswithcode.com/task/semi-supervised-semantic-segmentation. Or other people may be confused by the big margin. That's only my own point and please forgive me if I bother you.

charlesCXK commented 3 years ago

Hi, I know what you mean. However, the benchmark website you provide is just a reference. The comparisons in it are not fair at all, for example, they didn’t even use the same data partition (i.e. the same 1/8 subset of PASCAL VOC).

jinhuan-hit commented 3 years ago

Thanks for your kind and quick reply.