I tried to reproduce your method on ResNet18, and I set a proper lr (lr_img=100, lr_lr=1e-5, lr_teacher=0.01) to avoid exploding/vanishing gradient. However, I observed that the grand loss fluctuates around 0.9. Is it normal in this case? Could you please share your grand loss curve for reference?
Hi,
I tried to reproduce your method on ResNet18, and I set a proper lr (lr_img=100, lr_lr=1e-5, lr_teacher=0.01) to avoid exploding/vanishing gradient. However, I observed that the grand loss fluctuates around 0.9. Is it normal in this case? Could you please share your grand loss curve for reference?
BR, Xuyang