FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
4.03k stars 302 forks source link

how to use this for inpaint? #13

Open WenmuZhou opened 5 months ago

keyu-tian commented 5 months ago

hi @WenmuZhou I'm currently polishing up the training and zero-shot generalizing codes and they'll be ready in this week.

keyu-tian commented 5 months ago

The in/out-painting inference is done by teacher-forcing the tokens where we want to keep, and let the VAR transformer only generate the other tokens. For in/out-painting, no class condition will be used. For class-conditional editing, it is simply implemented as ``class-conditional'' in-painting.

WenmuZhou commented 5 months ago

Looking forward to the release of inpaint demo

kl2004 commented 5 months ago

I'm also interested in the inpaint demo. I wonder if it will be released soon.

Iceage7 commented 3 months ago

The in/out-painting inference is done by teacher-forcing the tokens where we want to keep, and let the VAR transformer only generate the other tokens. For in/out-painting, no class condition will be used. For class-conditional editing, it is simply implemented as ``class-conditional'' in-painting.

Could you please provide a more detailed explanation of how teacher-forcing the tokens is implemented in your model? Thanks.