Closed ChenDRAG closed 5 months ago
Hi~ Sorry for the confusion.
For learning rate of training AR models, it is 1e-4 per 256 batch size. If batch size increases to 512, lr could increase to 2e-4 accordingly(we don’t try to further increase batch size and lr)
For image crop augmentation, we use both 1.1 and 1.05. See https://github.com/FoundationVision/LlamaGen/blob/main/dataset/imagenet.py#L16
Thank you for your prompt response. So If I'm understanding correctly, #8 configuration is the actual hyperparameter used ? The hyperparameters for -B and -XXL model are indeed slightly different regarding bz and lr? (Though they may not affect the final performance very much.)
Yes! https://github.com/FoundationVision/LlamaGen/issues/8 configuration is the actual used one.
Thanks!
Hi, thank you for your excellent work in open-sourcing the code. I have several questions.