FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.33k stars 55 forks source link

Training Details? #14

Closed ChenDRAG closed 5 months ago

ChenDRAG commented 5 months ago

Hi, thank you for your excellent work in open-sourcing the code. I have several questions.

  1. I notice the paper says all models have the same learning rate of 1e-4 (The code is the same) and batch size of 256. However, #8 suggests otherwise (lr for XXL model is 2e-4). Which one is the correct one?
  2. I notice there are two options in image cropping. 1.1 and 1.05 are both provided in the source code. Which one is used for main experiments in the paper, please?
PeizeSun commented 5 months ago

Hi~ Sorry for the confusion.

For learning rate of training AR models, it is 1e-4 per 256 batch size. If batch size increases to 512, lr could increase to 2e-4 accordingly(we don’t try to further increase batch size and lr)

For image crop augmentation, we use both 1.1 and 1.05. See https://github.com/FoundationVision/LlamaGen/blob/main/dataset/imagenet.py#L16

ChenDRAG commented 5 months ago

Thank you for your prompt response. So If I'm understanding correctly, #8 configuration is the actual hyperparameter used ? The hyperparameters for -B and -XXL model are indeed slightly different regarding bz and lr? (Though they may not affect the final performance very much.)

PeizeSun commented 5 months ago

Yes! https://github.com/FoundationVision/LlamaGen/issues/8 configuration is the actual used one.

ChenDRAG commented 5 months ago

Thanks!