FoundationVision / VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
4.28k stars 315 forks source link

for in/out painting #70

Open Youngwoo-git opened 5 months ago

Youngwoo-git commented 5 months ago

While going through the paper again, just got curious about teacher-forcing gt outside the mask.

So my understanding is generating token as it is done in the provided code for the parts that are not masked. But what about the parts that are not masked? are they supposed to be initialized to 0? Not quite sure how the parts within mask can be generated without class info.

Thanks in advance!