When the training interruption resumes, the visualization of tensorboard seems to have a bug

PDillis / stylegan3-fun

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Other

230 stars 36 forks source link

When the training interruption resumes, the visualization of tensorboard seems to have a bug #31

Closed YoRha19990213 closed 1 year ago

YoRha19990213 commented 1 year ago

The previous training ended at 140 rounds, I used the parameter "--resume-kimg=140" to continue this training, training to 260 rounds, but I found that the two tensorboard output log files did not lose equal when training to 140 rounds, what is the reason. Or should I use "--resume-kimg=264" instead of 260 for my next follow-up training?

This is the parameter I used for training QQ截图20230322214921

This is the display panel of tesnorboard QQ截图20230322215006

PDillis commented 1 year ago

As your previous snapshot where you're resuming training from is at 260kimg, then you should use --resume-kimg=260. The fact that the values don't start from the exact same spot could be from a different augment strength at those points, assuming you were using ADA.

YoRha19990213 commented 1 year ago

QQ截图20230324192918

Thank you for your reply. I did find that ADA was used during training, but ADA only affects the changes in the learning rate. When I continued training using the weight file from the previous checkpoint, the initial loss changed as well（For example, this breakpoint of kimg=260...）. Is this also due to the influence of ADA? QQ截图20230324195755 QQ截图20230324195810

At the same time, I also found that the augmentation changes from 0 each time during training. Is there any parameter that can specify the value of augmentation when continuing training?

PDillis commented 1 year ago

I would say yes, it's due to ADA, as during the saving of the previous checkpoint at 260kimg, your ADA strength was different than 0.0 (which is the value it always starts with and then slowly goes towards the desired --target=0.6 you set). If you can see your training logs (i.e., log.txt in your training run), then you can see which augment you had during the 260kimg tick and just set --initstrength to that value.