Reproducibility Issue again

wjun0830 commented 3 years ago

Hi. I ran your code with the same instructions on README as well as #7. however I still couldn't get the reported performance. I am very curious that Is there any version dependency for this issue? or if this is random seed problem. Can you provide how you used cosine decay? Maybe Can you provide function "adjust_learning_rate" in train.py?

Your reported accuracy is 14.23 and it is also mentioned in #7 as below.	at 300 epoch	best acc
try1	14.78	14.23
try2	15.44	14.5
try3	15.00	14.68
average	15.07	14.47

However, what I am getting is '16.~, 15.~' best error on CIFAR100 using pyra 200.

Thanks in advance.

hellbell commented 3 years ago

@wjun0830 In the original paper and this issus (https://github.com/clovaai/CutMix-PyTorch/issues/7#issuecomment-516710134), we run the experiments using step decaying learning rates describes as in the paper, not the cosine learning rate scheduling. Did you train the model with the Readme's setting:

python train.py \
--net_type pyramidnet \
--dataset cifar100 \
--depth 200 \
--alpha 240 \
--batch_size 64 \
--lr 0.25 \
--expname PyraNet200 \
--epochs 300 \
--beta 1.0 \
--cutmix_prob 0.5 \
--no-verbose

with two GPUs? It then, could you provide your environment information (GPU model, driver, pytorch version etc.)?

wjun0830 commented 3 years ago

Yes I did. After running 3 times, i finally got similar error rate as reported. Here is environment that I used for this experiment. GPU : RTX Titan * 2 Driver Version : 418.56 CUDA version : 10.1 torch : 1.2.0

However, I am still interested in the cosine lr scheduling since I saw it here(#7) that it works well with CIFAR datasets. And since I wanna try some experiments based on your repo, It would be great if these gaps between experiments get small. If possible, can you provide the parameter for cosine scheduler? or the implementation of scheduler?

Oh one more question btw How long approximately did you take to experiment Imagenet on ResNet50? Thanks!

2021년 4월 8일 (목) 오후 4:41, Sangdoo Yun @.***>님이 작성:

@wjun0830 https://github.com/wjun0830 In the original paper and this issus (#7 (comment) https://github.com/clovaai/CutMix-PyTorch/issues/7#issuecomment-516710134), we run the experiments using step decaying learning rates describes as in the paper, not the cosine learning rate scheduling. Did you train the model with the Readme's setting:

python train.py \ --net_type pyramidnet \ --dataset cifar100 \ --depth 200 \ --alpha 240 \ --batch_size 64 \ --lr 0.25 \ --expname PyraNet200 \ --epochs 300 \ --beta 1.0 \ --cutmix_prob 0.5 \ --no-verbose

with two GPUs? It then, could you provide your environment information (GPU model, driver, pytorch version etc.)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clovaai/CutMix-PyTorch/issues/36#issuecomment-815531531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQYPMHQY3UWL5JA2TH4YU3THVM4LANCNFSM42KEFWHA .

-- 감사합니다. 문원준 드림.

hellbell commented 3 years ago

@wjun0830 Okay, probably this code snippet would help you for the cosine LR scheduling. But still it is weird that you can't reproduce the results using our code.

wjun0830 commented 3 years ago

Thanks a lot. I got similar results as reported after 3 runs. Maybe it was due to seed problem. Can you answer about how long did you take for Imagenet experiments on ResNet50?

hellbell commented 3 years ago

Thanks a lot. I got similar results as reported after 3 runs. Maybe it was due to seed problem. Can you answer about how long did you take for Imagenet experiments on ResNet50?

It takes about 1~2 weeks with 4 NVIDIA P40 GPUs.

wjun0830 commented 3 years ago

Thank you for kind replies.

clovaai / CutMix-PyTorch

Reproducibility Issue again #36