CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.57k stars 10.09k forks source link

is there a way to interpolate 2 init images? #185

Open loboere opened 2 years ago

loboere commented 2 years ago

I mean use 2 init images and Strength see the mix between two images

limesqueezy commented 2 years ago

@loboere I'm not sure if you can have 2 init images. It explicitly says img2img in their scripts

WeaverOfTheWeb commented 1 year ago

Midjourney has a new remix feature that merges 2 images. But it'd be sweet if we could do something similar with SD.

bhky commented 1 year ago

Just a pure guess: In a text-to-image model like Stable Diffusion, the noisy latent image is de-noised while conditioned on text embeddings. So what Midjourney did here could be using the image embeddings of the two images to condition the de-noising steps.

rskvazh commented 1 year ago

using the image embeddings of the two images to condition the de-noising steps

Yeah, but stable diffusions releases was trained on conditioning by text only :( Here stable diffusion fine-tuned for "image variations" https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations but need more training steps.

Maybe there is CLIP text embeddings <> image embeddings convertion existing...

saphtea commented 1 year ago

Seed travel