Open runner22k opened 1 year ago
I have the same question, does the model accept any non generated image and a given caption? I would like to use this model for zero shot object localization.
In theory yes, I'll look into implementing this further when I have time.
I managed to achieve this using this plugin: link. Use img2img and set step to 1, denoising strength to 0, and you are all set!
This will help us in creating better descriptions of an image while training Textual inversion (embeddings) and LoRA. How to use this on any image?
does it work other-way around too? Can we feed an image to the DAAM and get text prompt with heat maps?