TencentYoutuResearch / Classification-SemiCLS

Code for CVPR 2022 paper “Class-Aware Contrastive Semi-Supervised Learning”
Other
91 stars 13 forks source link

Config file of CIFAR10 experiments #6

Open syorami opened 1 year ago

syorami commented 1 year ago

Thanks for your work. I've tried to modify the current config files of fixmatch-ccssl on CIFAR100 and failed to reproduce the same reuslts on CIFAR10 as your paper shows. Would you publish the related config files of fixmatch-ccssl on CIFAR10?

KaiWU5 commented 1 year ago

Added. According to issue #4 , the code refactor changed the randomness of the code. Please try different seeds if needed. config

syorami commented 1 year ago

Added. According to issue #4 , the code refactor changed the randomness of the code. Please try different seeds if needed. config

I've tried several different seeds and finally reproduce the results. It seems that the performance fluctuates much. BTW, I also conducted experiment using comatch_stl10_wres_r18_b1x64_l5.py config on STL10 dataset and achieved accuracy 88%. It's much higher than your reported reuslt. Is there sth wrong?

Thanks!

KaiWU5 commented 1 year ago

I have tried several tests but didn't get results much better than 80% (In @aeo123 epoch40 got top-1 ACC > 84.4). Please check we are using the same training commands and configs.

Config: configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py

GPUS: 1 / single GPU following CoMatch paper

Training command: Command1 python3 train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out YOUR/output/path --seed YOURSeed --gpu-id 0 or Command2 python3 -m torch.distributed.launch --nproc_per_node 1 train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out /YOUR/output/path --use_BN True --seed YOURSeed

Data stl10_binary.tar.gz, Downloaded from Official website, 2640397119 bytes md5sum 91f7769df0f17e558f3565bffb0c7dfb

My Results Command1 & Seed1: In training, current Epoch 147, best top-1: 80.65 Command2 & Seed5: In training, current Epoch 262, best top-1: 80.20 Command2 & Seed1: In training, current Epoch 98, best top-1: 80.01 Command2 & Seed5 & Gpu2 : In training, current Epoch 117, best top-1 77.97

So, my guesses are as above, maybe we are using different training commands, config, num_gpu or data (the worst idea). Please let me know if we are all the same and have different results.

syorami commented 1 year ago

Thanks for your reply! Here is what i use for training:

Training Command srun -p caif_dev --ntasks=1 --ntasks-per-node=1 --gres=gpu:1 --cpus-per-task=20 python train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out workdirs/comatch_stl10_wres_r18_b1x64_l5 --seed 1 --gpu-id 0

Dataset It's valiidated that the md5sum (91f7769df0f17e558f3565bffb0c7dfb) of my local data files matches yours.

GPU I also use a single GPU for the training.

Code To ensure that no single line about the training process is modified, I carefully checked the git status.

And my result in the 20th epoch has already outperformed 80% acc. However, I noticed that my torch version is 1.10 instead of the default version (1.6) in your requirements. Although I don't think the torch version would affect much, I would give a try to see if this matters.

syorami commented 1 year ago

hi @KaiWU5 here is my training log under torch1.6 environment and it seems that the results differ much compared with torch1.1x. I didn't complete the whole training process. Would you like check if the training process matches your results? Thx! comatch_torch1.6.log

KaiWU5 commented 1 year ago

hi @KaiWU5 here is my training log under torch1.6 environment and it seems that the results differ much compared with torch1.1x. I didn't complete the whole training process. Would you like check if the training process matches your results? Thx! comatch_torch1.6.log

The trend of my results are nearly the same as yours. The phenomenon is getting clear and inconceivable. I will double check with torch1.10 and reply later.

syorami commented 1 year ago

Yeah I nearly did nothing but simply change the torch version and GPU type (I'm now using 3090ti which doesn't support torch1.6_cuda10.1 so I rent a TITAN XP for experiment). This really surprises me as my other results on CIFAR10/CIFAR100 are the same as yours except for STL10 under torch1.10.2_cuda11.3.

KaiWU5 commented 1 year ago

My experiment has the same phenomenon. Only change torch1.6 to torch 1.10 and I got top-1 acc at 82.21 at epoch 28 which means it would be way better than torch1.6 with more training. I checked several update logs of PyTorch and haven't found which part of torch is responsible for.

A lot thanks for discovering and finding the root cause together. Will keep you update if I found any clues.

syorami commented 1 year ago

I will keep this issue open to see if any conclusions could be reached. Also thanks for the cooperation on finding the cause!