Implications of Classifier-Free Guidance in Auto-regressive Models

FoundationVision / VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

MIT License

4.26k stars 314 forks source link

Hi, thank you for the insightful work!

I have some concerns regarding the classifier-free guidance (CFG) in auto-regressive models.

CFG in this work is implemented as follows:

https://github.com/FoundationVision/VAR/blob/1ae51772d2622e2fd44a188564cf394b71f5562d/models/var.py#L191-L192

However, it's important to note that CFG in auto-regressive models differs fundamentally from that in diffusion models (as outlined in Section 4 of this blog). In essence, the guidance in diffusion models is not theoretically applicable to auto-regressive models.

I am curious if this difference yields any notable empirical results. Have you conducted any quantitative or qualitative studies on the impact of CFG on this auto-regressive model? I would greatly appreciate any insights or empirical findings you could share on this subject.

FoundationVision / VAR

Implications of Classifier-Free Guidance in Auto-regressive Models #14