autonomousvision / stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
MIT License
961 stars 113 forks source link

How to restart training? #78

Open Askejm opened 2 years ago

Askejm commented 2 years ago

I want to train a bunch of stems. I figured I could do this with the --restart_every argument. So I set it to 58k secs but when it reaches that it just exits 14CB8956-E15E-49DC-AF1E-93BD3FA5155A How would I make it restart by itself? I want it to make a bunch of different stems that are 1k kimg. Windows 10, rtx 3070

woctezuma commented 2 years ago

when it reaches that it just exits

https://github.com/autonomousvision/stylegan_xl/blob/223430d0adb5eabee9f7e38e0bce73fc4b1818f6/training/training_loop.py#L133

Askejm commented 2 years ago

Could you elaborate on what you mean with that?

woctezuma commented 2 years ago

The argument of --restart_every is the time interval in seconds to exit code. So this works as intended.

What is stopping you from starting the program again after it exited? Is it supposed to automatically restart?

https://github.com/autonomousvision/stylegan_xl/blob/223430d0adb5eabee9f7e38e0bce73fc4b1818f6/training/training_loop.py#L174-L177

Be careful, because it seems to resume from the last checkpoint when you manually restart the program.

Maybe just set the training duration another way, and start the training from scratch a bunch of times.

Askejm commented 2 years ago

Yeah I noticed it resumes. But would you know a way? I might just open a bunch of anaconda prompts and have them offset by 58k secs. I can't always manually restart it because I'm on holiday

woctezuma commented 2 years ago

I might just open a bunch of anaconda prompts and have them offset by 58k secs.

That is what I would do as well.