Open Dionysus061726 opened 1 month ago
Here is the loss fig
Hi! Thanks for your work! I encountered some problems during training that I would like to ask for some help:
I have been training on my dataset for about 10 days, and the training steps have reached 1.91e+06. How can I know if the model has finished training? Or does the training process continue indefinitely if I don’t stop it manually?
In other words, what are the conditions for the training to end? Where is this reflected in the code?
| grad_norm | 0.0099 |
| loss | 0.00519 | | loss_q0 | 0.0155 | | loss_q1 | 4.72e-05 | | loss_q2 | 1.71e-05 | | loss_q3 | 8.83e-06 | | mse | 0.00519 | | mse_q0 | 0.0155 | | mse_q1 | 4.72e-05 | | mse_q2 | 1.71e-05 | | mse_q3 | 8.83e-06 | | param_norm | 939 | | samples | 1.15e+07 | | step | 1.91e+06 |
Hi 代码不会自动停止,除非你手动结束它 The code will not stop automatically unless you manually stop it.
对应的代码: guided_diffusion/train_utils.py文件
while (
not self.lr_anneal_steps
or self.step + self.resume_step < self.lr_anneal_steps
):
@szh404 Thank you! I didn't read the article carefully where the author specified that his iter steps was 50000. It's my fault. I also found out later that it was the matter of the number of iteration.
I had no deep learning experience, so I was confused about this. Anyway, I'll keep learning.
Hi! Thanks for your work! I encountered some problems during training that I would like to ask for some help:
I have been training on my dataset for about 10 days, and the training steps have reached 1.91e+06. How can I know if the model has finished training? Or does the training process continue indefinitely if I don’t stop it manually?
In other words, what are the conditions for the training to end? Where is this reflected in the code?
| grad_norm | 0.0099 | | loss | 0.00519 | | loss_q0 | 0.0155 | | loss_q1 | 4.72e-05 | | loss_q2 | 1.71e-05 | | loss_q3 | 8.83e-06 | | mse | 0.00519 | | mse_q0 | 0.0155 | | mse_q1 | 4.72e-05 | | mse_q2 | 1.71e-05 | | mse_q3 | 8.83e-06 | | param_norm | 939 | | samples | 1.15e+07 | | step | 1.91e+06 |