The accuracy is not close to the accuracy displayed in log

masahiro173 commented 1 year ago

When I tried learning using config_semi.yaml of citys_semi372, the accuracy was Best-STU:70.40/239 Best-EMA: 70.33/239, and the author's log Best-STU:76.94/236 Best-EMA: 77.43/188. There was a large discrepancy. Due to the GPU, the number of training batches was set to 8 on a single GPU, so there is a difference in the learning environment, but is there really that much of a difference?

ZhenZHAO commented 1 year ago

？should no.... please provide more detail,

make sure, that you did not revise anything?
share me with your training log
training env, like the gpu, bz... etc.

masahiro173 commented 12 months ago

The only changes were to use one GPU and increase the number of batches to 8. GPU uses NVIDIA A100-SXM4-80GB. What is bz? I also attach the training log. seg_2023-09-22_09_36_47.log

ZhenZHAO commented 12 months ago

It seems the bz(batchsize) is the only difference. I set bz=4 per gpu and using 4 V100, (thus the ultimate bz = 16). There are some tips, you may improve the performance,

(it is obvious from your provided log that the performance is still improving slowly, and better performance can be expected with longer training epoch.... but it is not recommended. I found it takes so long for you to conduct exps on citys.)

If you cannot set bz=16 as I did (bz is very important for segmentation tasks), then you may need to increase the unsupervised loss instead of increasing the epochs,

set threshold from 0.7 to 0.0. Logic is, to let iMAS handle the unsupervised loss.
set the cutmix prob from 0.5 to 1.0. Increase the augmentation, and let iMas handle the mixup.

BTW, I don't understand why the performance is so bad at the first training epoch. At the epoch 0, only labeled data is used to train the model, where my log shows " 24.15/0", but your log only get "12.24/0"... Please double check your code, it is kind of weird.

masahiro173 commented 12 months ago

Thank you for answering. I'll try changing the parameters once. By the way, it looks like we can secure 2 GPUs(I don't have V100, so A100), so we are planning to try it with each 8 batches(16 batches in total). In this case, do you think the accuracy will be equivalent to 4 GPUs with 4 batches each?

masahiro173 commented 12 months ago

I would like to ask an additional question. What could be the reason for such a large difference in accuracy for different batch sizes? Also, I would like to ask why you think it is better to increase the unsupervised loss value when the number of BATCH is small.

ZhenZHAO commented 11 months ago

Thank you for answering. I'll try changing the parameters once. By the way, it looks like we can secure 2 GPUs(I don't have V100, so A100), so we are planning to try it with each 8 batches(16 batches in total). In this case, do you think the accuracy will be equivalent to 4 GPUs with 4 batches each?

Yes, that is the only difference I can see. Pls give it a try.

ZhenZHAO commented 11 months ago

I would like to ask an additional question. What could be the reason for such a large difference in accuracy for different batch sizes? Also, I would like to ask why you think it is better to increase the unsupervised loss value when the number of BATCH is small.

It is not about the batch size. Simple becasue I observe from your log that your training loss is small, and the improvement is increasing quite slowly compared to my training log. Beside, increasing the unsuped loss can be a way to further improve iMAS's performance.

ZhenZHAO / iMAS

The accuracy is not close to the accuracy displayed in log #15