Resume training confusion

mcrisafu commented 2 months ago

Hi everyone. Thank you a lot for the amazing work!

I am able to train a model with load_from. I assume, this trains the model from scratch. Is it possible to resume training on the checkpoint you provided with resume_from? If I use something like this :

resume_from = dict(checkpoint="/models/epoch_0_step_0.pth", load_ema=False, resume_optimizer=False, resume_lr_scheduler=False)

I get an error about the DataLoader: RuntimeError: Trying to resize storage that is not resizable

(epoch_0_step_0.pth is the renamed PixArt-Sigma-XL-2-1024-MS.pth chekpoint)

lawrence-cj commented 2 months ago

There is no optimizer information in the ckpt we provide. No meaning to resume.

mcrisafu commented 2 months ago

Ahh. Got it. Thank you for your fast answer!

PixArt-alpha / PixArt-sigma

Resume training confusion #45