Closed ozanciga closed 1 year ago
Hello, thank you for your interest.
thank you for the answer. i've been racking my brain to come up with an elegant solution but unfortunately all solutions seem to require training some costly network, be it conversion from t5 to clip and/or training an image encoder.
however, it's still nice to have this template of a solution. i believe gradient guided automated prompting like the ones you presented is the future, so thanks for your contribution!
hey, thank you for your great work!
i had a few questions regarding adapting this algorithm to another setup which may not use clip (e.g., imagen or ediffi).
have you experimented with transferring the prompts for image generation on other networks? table 2 does this for sst-2, but i'm not sure if there's any experiments on image generation.
if i wanted to take the algorithm and train for another text encoder, e.g., t5, how would i go about it? are there proxies to a contrastive image-text encoder pair which can be used for gradient reprojection?
thank you