Closed yangyangtiaoguo closed 2 years ago
Is it NaN loss from the very first iteration?
Is this only happening when you use your own dataset?
Did your experts seem to train okay? If so, around what epoch did they seem to converge?
What model are you using? We used ConvNetD4 for 64x64 images.
Hello, author. Thank you for your work.! Running distill During py, loss is always Nan. What parameters do the author suggest to adjust? Or did I ignore what caused the error? In addition: I use my own dataset. The experimental settings and dataset settings are shown in the figure below.