algorithm 1, and the necessity of image encoder

YuxinWenRick / hard-prompts-made-easy

MIT License

601 stars 54 forks source link

algorithm 1, and the necessity of image encoder #19

Closed ozanciga closed 1 year ago

ozanciga commented 1 year ago

hey, thank you for your great work!

i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).

have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.
if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?

thank you

YuxinWenRick commented 1 year ago

Hello, thank you for your interest.

Unfortunately, we do not have access to other models that use text encoders other than CLIP, such as Imagen or Ediffi. However, we did try applying the prompts to Midjourney, whose text encoder is not publicly known, and the results were impressive.
As far as I am aware, there is currently no proxy vision model available for T5.

ozanciga commented 1 year ago

thank you for the answer. i've been racking my brain to come up with an elegant solution but unfortunately all solutions seem to require training some costly network, be it conversion from t5 to clip and/or training an image encoder.

however, it's still nice to have this template of a solution. i believe gradient guided automated prompting like the ones you presented is the future, so thanks for your contribution!