Large loss value in student model

RanFeng2 commented 7 months ago

Thank you for your work on this project. It's been incredibly helpful for me.

However, I've encountered a challenge that I hope to seek your advice on. During the training process of student model, I observe that the loss value turns into NaN at the first epoch, which has been quite perplexing.

Thank you in advance for your time and help.

RanFeng2 commented 7 months ago

After I adjusted the learning rate, NaN values no longer appear. However, I've noticed that the loss remains significantly higher than expected, even after the NaN values no longer appear. Could you please offer any insights or suggestions on potential steps I could take to reduce the loss further?

Margin1996 commented 7 months ago

Hi~ Thanks for your interest in my work. You can troubleshoot from the following aspects:

Check if the teacher model has been pre-trained, as the framework requires the teacher model to undergo pre-training first.
Check if the learning rate set during training is too high. It's best to use 1e-4 or 1e-5 for training student.
If the above methods do not solve the issue, consider using pre-trained weights loaded into the student model (pre-trained weights can be obtained from https://github.com/bubbliiiing/hrnet-pytorch/). This can shorten the fitting time and also leverage the effectiveness of this method. I hope my answer is helpful to you!

Margin1996 / Assisted_learning

Large loss value in student model #1