Closed melody-rain closed 3 years ago
@melody-rain we reproduced results several times in different experiments using this code. Can you provide more bug report in more details? Thanks in advance
@bes-dev hi. I just trained mobile-stylegan_ffhq-512(512 is the size of the generated images) with your code. The teacher model is the pytorch one converted from official tensorflow model. There was nothing changed any more. When training the loss is sometimes like loss=3.18e+04.
BTW, I added gradient_clip_val=1.0 to pl.Trainer(), the problem does not exist any longer, and I can successfully train mobile-stylegan_ffhq-512.
Many thanks to your work.
@bes-dev hi. I just trained mobile-stylegan_ffhq-512(512 is the size of the generated images) with your code. The teacher model is the pytorch one converted from official tensorflow model. There was nothing changed any more. When training the loss is sometimes like loss=3.18e+04.
BTW, I added gradient_clip_val=1.0 to pl.Trainer(), the problem does not exist any longer, and I can successfully train mobile-stylegan_ffhq-512.
Many thanks to your work.
Did you reproduce the desired results?
In my training, "kid_val' value decreases at the beginning, but it increases after sevel iterations and logs "kid_val was not in top True".
Many thanks. WAIT FOR YOUR REPLY.
@melody-rain @bes-dev
@melody-rain we reproduced results several times in different experiments using this code. Can you provide more bug report in more details? Thanks in advance
Can you give an overview of your experimental environment? For example: cuda version torch version gcc version
When I use cuda10, the loss is easy to nan, but when I switch to cuda11, it is ok. @ZYzhouya
However, I used cuda11 + pytorch 1.7 + gcc5.4 + ubuntu16.04, and used your code and cfg file + 4 2080ti. After training for 5 days, it still can't reach your effect. Do you have any advice
I trained with this repo, but the loss became huge very soon, something like loss=3.18e+04.