Theorical explanation - Githubissues

TheDenk / images_mixing

Сombine images using usual diffusion models.

Apache License 2.0

22 stars 1 forks source link

Theorical explanation #3

Closed wadie999 closed 2 weeks ago

wadie999 commented 1 month ago

Hello, very interesting project. Is there any paper or deeper explanation of mixing process ? Is there a way of mixing 2 images but preserving informations from an image when mixing ? for example mixing a QR code with an image but preserving the QR Code modules.

TheDenk commented 1 month ago

Hey.

There is no paper or deeper explanation. I just came up with this when I saw the code from clip_guided_stable_diffusion_img2img pipeline.
For images mixing with preserving information (QR Code or other) it is better to use approaches with controlnet or ip-adapter.

wadie999 commented 1 month ago

Hey Denk, thank you for the insights I was wondering, is it possible to mix images without prompts ? I see that even if you take 2 images as inputs, the coca model create captions that serves as prompt to guide the generation. Do you think the diffusion process can rely on image embeddings in latent space ? using a vision encoder

TheDenk commented 1 month ago

Hi @wadie999 :) If I understand you correctly there is a jupyter-notebook example which does not use coca model. So you can pass empty string. Also if you interested in mix images you try to use Kandinsky model for this purpose.