Closed nicolai256 closed 2 years ago
Yes, this can be done. I will add the --initstrength
that @dvschultz has in his repo. The tick
resume can be done, but it's just the number of iterations on the current run. In the end, what really matters is the kimg
that the model has been trained for, which is already solved.
It should be fixed now, you can set --initstrength=34.266
or whatever you want. I think a more careful analysis should be done here w.r.t. rampup/ema, but let me know if this works for you.
Describe the bug the storage of the remote pc was full so it stopped training but it's already pretty far into training i'd like to resume without it blurring the new gens maybe it's possible with the log file?
tick 1587 kimg 6348.0 time 7d 08h 07m sec/tick 396.5 sec/kimg 99.12 maintenance 0.3 cpumem 6.35 gpumem 30.74 reserved 41.21 augment 34.266