FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.78k stars 285 forks source link

for in/out painting #70

Open Youngwoo-git opened 3 weeks ago

Youngwoo-git commented 3 weeks ago

While going through the paper again, just got curious about teacher-forcing gt outside the mask.

So my understanding is generating token as it is done in the provided code for the parts that are not masked. But what about the parts that are not masked? are they supposed to be initialized to 0? Not quite sure how the parts within mask can be generated without class info.

Thanks in advance!