Why give up image prior in Kandinsky 2.1 and Kandinsky 2.2 ?

ai-forever / Kandinsky-3

https://ai-forever.github.io/Kandinsky-3/

Apache License 2.0

290 stars 27 forks source link

Why give up image prior in Kandinsky 2.1 and Kandinsky 2.2 ? #6

Closed Zeqiang-Lai closed 7 months ago

Zeqiang-Lai commented 7 months ago

Thanks for sharing this great work to opensource community.

I am a little curious about the design choice of Kandinsky 3. Could you give some explanation why your give up the image prior in Kandinsky 2.1 and Kandinsky 2.2 ?

Does it means that the image prior kind of useless ?

anvilarth commented 7 months ago

We were interesting in getting better text understanding in diffusion model, but CLIP text encoder is pretty bad :) So, we choose Flan-UL2 as it's the best open source transformer encoder and train the model without CLIP (it's partially inspired by Imagen work) and it works)

Zeqiang-Lai commented 7 months ago

Ok, I see. Thanks for the response :)