Config file of CIFAR10 experiments #6

syorami commented 1 year ago

Thanks for your work. I've tried to modify the current config files of fixmatch-ccssl on CIFAR100 and failed to reproduce the same reuslts on CIFAR10 as your paper shows. Would you publish the related config files of fixmatch-ccssl on CIFAR10?

KaiWU5 commented 1 year ago

Added. According to issue #4 , the code refactor changed the randomness of the code. Please try different seeds if needed. config

syorami commented 1 year ago

I've tried several different seeds and finally reproduce the results. It seems that the performance fluctuates much. BTW, I also conducted experiment using comatch_stl10_wres_r18_b1x64_l5.py config on STL10 dataset and achieved accuracy 88%. It's much higher than your reported reuslt. Is there sth wrong?


KaiWU5 commented 1 year ago

I have tried several tests but didn't get results much better than 80% (In @aeo123 epoch40 got top-1 ACC > 84.4). Please check we are using the same training commands and configs.

Config: configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py

GPUS: 1 / single GPU following CoMatch paper

Training command: Command1 python3 train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out YOUR/output/path --seed YOURSeed --gpu-id 0 or Command2 python3 -m torch.distributed.launch --nproc_per_node 1 train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out /YOUR/output/path --use_BN True --seed YOURSeed

Data stl10_binary.tar.gz, Downloaded from Official website, 2640397119 bytes md5sum 91f7769df0f17e558f3565bffb0c7dfb

My Results Command1 & Seed1: In training, current Epoch 147, best top-1: 80.65 Command2 & Seed5: In training, current Epoch 262, best top-1: 80.20 Command2 & Seed1: In training, current Epoch 98, best top-1: 80.01 Command2 & Seed5 & Gpu2 : In training, current Epoch 117, best top-1 77.97

So, my guesses are as above, maybe we are using different training commands, config, num_gpu or data (the worst idea). Please let me know if we are all the same and have different results.

syorami commented 1 year ago

Thanks for your reply! Here is what i use for training:

Training Command srun -p caif_dev --ntasks=1 --ntasks-per-node=1 --gres=gpu:1 --cpus-per-task=20 python train_semi.py --cfg configs/comatch/comatch_stl10_wres_r18_b1x64_l5.py --out workdirs/comatch_stl10_wres_r18_b1x64_l5 --seed 1 --gpu-id 0

Dataset It's valiidated that the md5sum (91f7769df0f17e558f3565bffb0c7dfb) of my local data files matches yours.

GPU I also use a single GPU for the training.

Code To ensure that no single line about the training process is modified, I carefully checked the git status.

And my result in the 20th epoch has already outperformed 80% acc. However, I noticed that my torch version is 1.10 instead of the default version (1.6) in your requirements. Although I don't think the torch version would affect much, I would give a try to see if this matters.

syorami commented 1 year ago

hi @KaiWU5 here is my training log under torch1.6 environment and it seems that the results differ much compared with torch1.1x. I didn't complete the whole training process. Would you like check if the training process matches your results? Thx! comatch_torch1.6.log

KaiWU5 commented 1 year ago

The trend of my results are nearly the same as yours. The phenomenon is getting clear and inconceivable. I will double check with torch1.10 and reply later.

syorami commented 1 year ago

Yeah I nearly did nothing but simply change the torch version and GPU type (I'm now using 3090ti which doesn't support torch1.6_cuda10.1 so I rent a TITAN XP for experiment). This really surprises me as my other results on CIFAR10/CIFAR100 are the same as yours except for STL10 under torch1.10.2_cuda11.3.

KaiWU5 commented 1 year ago

My experiment has the same phenomenon. Only change torch1.6 to torch 1.10 and I got top-1 acc at 82.21 at epoch 28 which means it would be way better than torch1.6 with more training. I checked several update logs of PyTorch and haven't found which part of torch is responsible for.

A lot thanks for discovering and finding the root cause together. Will keep you update if I found any clues.

syorami commented 1 year ago

I will keep this issue open to see if any conclusions could be reached. Also thanks for the cooperation on finding the cause!