Haochen-Wang409 / U2PL

[CVPR'22] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Apache License 2.0
425 stars 59 forks source link

Questions about performance by batch size #105

Closed DeepHM closed 1 year ago

DeepHM commented 1 year ago

Hello. First of all, thank you for sharing your wonderful research!

I have some questions. I am comparing your study with the CPS study, which is a TOP3 benchmark on semi-supervised semantic segmentation (pascal VOC dataset). According to your paper and code, it seems to be done with batch-size = gpus(4)batch_size(4) = 16. However, according to the paper and code of CPS, it was done with labeled_data: 8 batch & unlabeled_data: 8 batch. Therefore, to ensure fair comparison, I kept all other options unchanged and trained your code with a resnet50 model, batch-size of 8 (which is the result of multiplying 2 GPUs and 4 batch-size ; 2(GPUS)4(batch-size setting)=8), and set 80 epochs for all labeled ratios using your code. Also, the implementation of CPS was also trained by setting the epochs to 80 for all labeled ratios.

Below is the result of my re-implementation. (CPS vs U2PL)

1/8 1/4 1/2
Epoch80 Score 68.52 71.87 75.29
BEST Score 73.00 74.80 76.00
BEST Epoch 23 21 60
1/8 1/4 1/2
Epoch80 Score 74.094 75.406 75.651
BEST Score 74.937 75.557 75.786
BEST Epoch 67 66 78

This result shows the difference with the results of your study. Also, I think the batch-size in semi-supervised learning is much more important than the importance of batch-size in supervised-learning.

I have a few questions to ask

  1. Is there no problem with the results of my re-implementation described above?
  2. I would like to ask your opinion on the importance of batch-size in semi-supervised learning(semi-supervised semantic segmentation). Also, I would appreciate it if you could let me know if there is anything I can refer to.
  3. As a result of re-implemention your study (the result table above), it seems strange to show the highest score at low epochs (especially 1/8 or 1/4 ratio). This is thought that continuous training iterations do not improve performance. What do you think?
Haochen-Wang409 commented 1 year ago

It seems that you have provided reproduced PS-MT v.s. CPS (from the caption of your tables), rather than our U2PL.

DeepHM commented 1 year ago

It seems that you have provided reproduced PS-MT v.s. CPS (from the caption of your tables), rather than our U2PL.

I'm sorry, I only mistook the table name. The table was renamed in question. The changed table is the U2PL result table.

Haochen-Wang409 commented 1 year ago

Did you re-implement both U2PL and CPS on blender VOC benchmark? (i.e., there are 10582 images in total). If that is true, it is very strange of your reproduced results. It seems that you implement our U2PL on classical VOC (1464 images in total) but CPS on blender VOC. Please check your settings again.

Haochen-Wang409 commented 1 year ago

And, actually, a batch consists of 4 labeled images and 4 unlabeled images. Therefore, the batch_size in our config is actually (4+4)*4=32. If you are reproducing our results with a different batch_size, please linearly adjust the learning rate.

DeepHM commented 1 year ago

Thank you for your kind reply.

The data used in my question is the Pascal VOC dataset, and your paper is reported in Resnet101, but please confirm once again that my question and result are results for Resnet50.

I trained CPS and U2PL based on data(Pascal-VOC) provided by CPS : https://pkueducn-my.sharepoint.com/personal/pkucxk_pku_edu_cn/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fpkucxk%5Fpku%5Fedu%5Fcn%2FDocuments%2FDATA&ga=1

Also, I changed the split of labeled data and unlabeled data to split.txt of cps as shown in the screenshot below. ( ex. [ train_aug_labeled_1-8.txt , train_aug_unlabeled_1-8.txt ] : File Source -> https://github.com/charlesCXK/TorchSemiSeg/tree/main/DATA/pascal_voc/subset_train_aug

Next, following your advice to check the data, I reconfirmed the data in the train-log :

Therefore, since I train on two GPUs, the batch size is labeled-data : 4x2=8 and unlabeled-data : 4x2=8, so the total batch size is 16. This is the same as the CPS setting. And according to your learning rate advice, I re-trained after readjusting the learning rate :

Even considering that my backbone is Resnet50, it feels strange -> Low performance, no performance improvement from around 20 epoch. Additionally, I experienced the same phenomenon in which the performance is greatly reduced when batch-size is reduced by half in the implementation of CPS.

  1. Are my U2PL test results normal?
  2. I think the importance of batch-size is a much more important issue in semi-supervised learning (especially semi-supervised segmentation). I think the problem of inaccurate pseudo-labels leads to a bigger problem as batch-size decreases. This is thought to be particularly related to batch-normalization in the optimization(back-propagation) process by false pseudo-labels. In addition, it is thought that the number of false pseudo-labels by the reduced batch-size makes the loss landscape more confusing. At this point, I would like to ask you again for your opinion. Does batch-size really not matter?
Haochen-Wang409 commented 1 year ago

Could you provide the config and the log while reproducing our work? Our U2PL is trained with 4 GPUs on PASCAL VOC with ResNet-101, and all checkpoints and logs are provided (please refer to README.md). We did not train our U2PL using ResNet-50 with 2 GPUs.

If your reproduced results were normal, the influence of batch_size in semi-supervised semantic segmentation can be further studied. Similar ideas have been discussed in [1], where different BNs are used for labeled/unlabeled data.

[1] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation. In ICCV, 2021.

DeepHM commented 1 year ago

Thank you for your kind reply!