Closed YoRha19990213 closed 1 year ago
As your previous snapshot where you're resuming training from is at 260kimg
, then you should use --resume-kimg=260
. The fact that the values don't start from the exact same spot could be from a different augment
strength at those points, assuming you were using ADA.
Thank you for your reply. I did find that ADA was used during training, but ADA only affects the changes in the learning rate. When I continued training using the weight file from the previous checkpoint, the initial loss changed as well(For example, this breakpoint of kimg=260...). Is this also due to the influence of ADA?
At the same time, I also found that the augmentation changes from 0 each time during training. Is there any parameter that can specify the value of augmentation when continuing training?
I would say yes, it's due to ADA, as during the saving of the previous checkpoint at 260kimg
, your ADA strength was different than 0.0
(which is the value it always starts with and then slowly goes towards the desired --target=0.6
you set). If you can see your training logs (i.e., log.txt
in your training run), then you can see which augment
you had during the 260kimg
tick and just set --initstrength
to that value.
The previous training ended at 140 rounds, I used the parameter "--resume-kimg=140" to continue this training, training to 260 rounds, but I found that the two tensorboard output log files did not lose equal when training to 140 rounds, what is the reason. Or should I use "--resume-kimg=264" instead of 260 for my next follow-up training?
This is the parameter I used for training
This is the display panel of tesnorboard