NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.45k stars 1.14k forks source link

How to recover if training is interrupted #538

Open xyt000-xjj opened 1 year ago

xyt000-xjj commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. In '...' directory, run command '...'
  2. See error (copy&paste full log, including exceptions and stacktraces).

Please copy&paste text instead of screenshots for better searchability.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

PDillis commented 1 year ago

Just point to the last .pkl that was saved and want to resume from with the --resume argument in train.py. Note that the new training will start from 0, so account for that when setting how many images to train for in --kimg.

therealjr commented 11 months ago

@PDillis I understand that training will resume from that point. However, when I save a snap at tick 0 it shows nothing but blurred images. Why is it that the images are resetting entirely? Shouldn't it be generating data like it was trained on from the point it left off at?

dookiethedog commented 8 months ago

My Gan crashed and I was extremely annoyed as I was experiencing the exact same issue so I decided to read into the code. You can set the inital augmentation and kimg in the training_loop.py file, this will help but this will not actually continue the training from when it last ran it will only give it an idea where to start off again. The Dev's don't seem to care if it does crash as there is no proper resume code, I was actually able to modify the code and create a perfect resume function, however, I will not be able to resume from my first Gan as I did not have my code added yet so there is no way to pull the settings needed, but at least for future I will be all good and have everything stored in the pickle file.

frankthequeen commented 4 months ago

I was actually able to modify the code and create a perfect resume function

Could you share your code?