Closed ddongmogi closed 1 year ago
Hi
At the moment, there is no implementation for a maximum number of epochs when we stop the training. If you wish to fix the maximal number of epochs, you will need to include it in the TrainLoop.
However, intermediate results will be saved anyway with the flag save_interval
. You can also check with those saved checkpoints when your model is doing a good job, and then stop the training.
Oh. Naturally, I thought epoch was fixed, so I thought there would be an epoch variable somewhere. I didn't know it was about choosing good training points.
Thank you for your reply. It was very helpful.
Hello! Thank you for providing your excellent code! Before I start asking questions, I first tell you that I am a beginner in Diffusion and deep learning and may have a low understanding.
I wonder which parts need to be modified to reduce the total number of epochs in training. I use the basic training script, and currently self.step + self.resume_step is up to 240,000. The training is not finished yet.
So I want to know where I can modify the epoch in the code to control the entire training time.
Thanks.