How to use ImageBind to generate image or audio?

facebookresearch / ImageBind

ImageBind One Embedding Space to Bind Them All

Other

8.2k stars 751 forks source link

How to use ImageBind to generate image or audio? #42

Open NateDong72 opened 1 year ago

NateDong72 commented 1 year ago

I can run the example code. But how to run the model to generate the some images and audio?

SoftologyPro commented 1 year ago

Agreed. How can you guys spend all that time training the model and writing the paper and setting up the demo website and not spend a few hours giving working example scripts to show us how to use it?

echo-lalia commented 1 year ago

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

WilTay1 commented 1 year ago

I don't think the model can actually generate those things; I think it just 'translates' the information from one form to another. I think it'll have to be built into an extension for SD-WebUI or something, in order to let us play with it more easily.

But the model can be downloaded and loaded in the script.

bakachan19 commented 1 year ago

I am also interested in this. Any news? Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper. Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image? Thanks!

ikuinen commented 1 year ago

I am also interested in this. Any news? Also, how can you retrieve an image based on image and audio/text? I am referring to the embedding space arithmetic examples in Figure 4 in the paper. Do you just sum the image embeddings with the audio/text embedding and perform cosine similarity with all the image embeddings and get the most similar image? Thanks!

We made a quick attempt: https://github.com/sail-sg/BindDiffusion

Zeqiang-Lai commented 1 year ago

See also Anything2Image and InternGPT, it is implemented with Diffusers.

SoftologyPro commented 1 year ago

See also Anything2Image , it is implemented with Diffusers.

This works well with a nice gradio GUI interface.

ChloeL19 commented 1 year ago

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Zeqiang-Lai commented 1 year ago

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help https://github.com/Zeqiang-Lai/Anything2Image/issues/4

celster commented 1 year ago

I'm rather new to diffusion, but does Imagebind provide any sort of decoder? I thought it was just training an encoder, and if that's the case how are these diffusion methods working?

Maybe this could help Zeqiang-Lai/Anything2Image#4

This is great!! I'm also looking for "Image+Text --> Image". For example, take a photo and ask to perform some augmentation to the person on the photo (e.g. makeup).