FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.8k stars 285 forks source link

Implications of Classifier-Free Guidance in Auto-regressive Models #14

Open fkcptlst opened 2 months ago

fkcptlst commented 2 months ago

Hi, thank you for the insightful work!

I have some concerns regarding the classifier-free guidance (CFG) in auto-regressive models.

CFG in this work is implemented as follows:

https://github.com/FoundationVision/VAR/blob/1ae51772d2622e2fd44a188564cf394b71f5562d/models/var.py#L191-L192

However, it's important to note that CFG in auto-regressive models differs fundamentally from that in diffusion models (as outlined in Section 4 of this blog). In essence, the guidance in diffusion models is not theoretically applicable to auto-regressive models.

I am curious if this difference yields any notable empirical results. Have you conducted any quantitative or qualitative studies on the impact of CFG on this auto-regressive model? I would greatly appreciate any insights or empirical findings you could share on this subject.

keyu-tian commented 2 months ago

@fkcptlst in the Ablation Study section of the paper we tested the influence of CFG. We simply follow Google MUSE's CFG introduced in their paper. https://sander.ai/2022/05/26/guidance.html seems a thorough analysis on CFG. We'll check that later and maybe try some more implementations. Thank you for providing this!