[Request] A method to resume training with different batch size while keeping your G epoch and nkimg value.

nom57 commented 1 year ago

on SG2 and SG3 given you use them on a modified fork you can resume training on a completely different batch size and still keep your Tick / Nkimg progress , by specifying it with the kwarg ""--nkimg"" example --nkimg=2500 resumes training with an assumed 2500kimg progress.

SGXL resets kimg to 0 if you change the batch size

I found its extremely useful to start with a very low batch size such as --batch=2 --glr=0.0008 --dlr=0.0006 to improve diversity and then swittch to a batch size of 32 / 64 / 128 for better FID when im starting to bottom out FID with batch=2 ,

however because the Augmentation , the G epoch and kimg resets in SGXL when doing this , I am having a really bad time.

nom57 commented 1 year ago

this method of training can make recall be 0.7+ instead of 0.5 on many datasets while still reaching the same FID (after also bottoming out FID on 128 batch size) so its tremendously better recall with this method. with batch=2 it also converges much faster so the first part of training focusing on diversity is very fast.

nom57 commented 1 year ago

for example the pokemon dataset can have a recall of 0.787 at 64x64 with this method @xl-sr so an -nkimg resume feature would be tremendously helpful

nom57 commented 1 year ago

update :

this seems to favor SG2-Ada way more than SGXL , SGXL can easily have collapses with low batch sizes , so its hard to tame , but still , a batch size of 2-16 for the first 144kimg and then switching to a batch size of 64 or 128 , seems beneficial on Unimodal datasets for better recall and faster training.

autonomousvision / stylegan-xl

[Request] A method to resume training with different batch size while keeping your G epoch and nkimg value. #92