FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
4.03k stars 303 forks source link

How to set progressive training in traing the VAR? #29

Closed HaozheZhao closed 5 months ago

HaozheZhao commented 5 months ago

Great job indeed!

Currently, I am attempting to train the VAR by utilizing the script and code you provided. Upon inspection of the training code and args list, it appears that there are three args, namely pg, pg0, and pgwp, designated for progressive training. I am curious about the hyperparameters you configured for your training, as well as the specific model chosen for progressive training.

Furthermore, I am interested in understanding why the VAR class's self.prog_si in var.py is initially set to -1 and remains unchanged throughout. It seems that neither train.py nor trainer.py reset the prog_si attribute of the VAR class.

krennic999 commented 5 months ago

The prog_si of VAR is modified in train_step in trainer.py, and the default pg seems to be 0 in the whole training loop, pgwp seems to be the setup in warmup during training

HaozheZhao commented 5 months ago

The prog_si of VAR is modified in train_step in trainer.py, and the default pg seems to be 0 in the whole training loop, pgwp seems to be the setup in warmup during training

I'm implying that three arguments - pg, pg0, and pgwp - are specified for progressive training. However, in the provided startup script, none of these arguments are set. Thus, it appears that the progressive training feature is disabled in the given script, right?

krennic999 commented 5 months ago

Yes, I tend to think so (but not very sure)

krennic999 commented 5 months ago

hoping the author to give a more concise explanation

JamesHujy commented 5 months ago

interested in progressive training as well. looking forward to the explanation

keyu-tian commented 5 months ago

Hi @HaozheZhao @krennic999 @JamesHujy, we only use this progressive training when the computation resource is limited and we want to speed up the training, e.g., training d36-s on 512x512 images, by runing with --pg=0.7--pg0=4. For explanation on these hyperparameters, see utils/args.py. This configuration means doing progressive training in the 0%-70% training phase, and full-scale training in the 70%-100% phase. The warm-up-epochs-per-progress pgwp is automatically set in utils/args.py by args.pgwp = args.ep * 1/300.