FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.78k stars 285 forks source link

Question about the unconditional generation #31

Closed minimini-1 closed 2 months ago

minimini-1 commented 2 months ago

Hello Firstly, thanks for your great work and share the code including weights!

I have quite simple question about unconditional generation. Referring to figure 4, I understood that the model can be passible to create image without giving class information in sos token. image

When referring to the demo_sample.ipynb file (in autoregressive_infer_cfg funciton), it seems that if label_B is None, it is not unconditional generation because it seems to randomly select from 1000 classes by sampling. Is the without condition information in a paper the same as if label_B is None?

Also, Referring to the class_emb, the size of the embedding is 1001, which is 1 more than the imagenet class, but when I generate an image using the 1000th class token, it seems like I get a pretty random image. Is generating an image with this class index the same as without condition information?

Sincerely, Jongmin

keyu-tian commented 2 months ago

thank you @minimini-1 for the kind words.

About the [sos]: class_emb[1000] is the unconditional [sos] token. class_emb[999] is the 1000-th class label token, and class_emb[0] is the first class label token.

minimini-1 commented 2 months ago

OK, thanks for your reply!