CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.69k stars 1.52k forks source link

Text + partial image prompting #34

Open cachett-ML opened 2 years ago

cachett-ML commented 2 years ago

Hi !

In Dall-E, we can provide a partial image in addition to the text description so that the model only completes the image. See:

Capture

Can we do the same with your models? That would be awesome. I tried to modify the LAION-400M model notebook but without much success.

hyungkwonko commented 2 years ago

Good question! I am also being curious whether it is possible with the provided source code.

lxj616 commented 2 years ago

@hyungkwonko Yes it sure does, I already implemented and tested on my own, need some coding

Just pass the x0 and mask into the DDIM sampler, and you can do inpainting while using text prompts

@cachett-ML for your case, mask the image (eg. only the top half), then do the inpainting with text prompts, however, you can do the same, but I have no clue if it would work... IMO, a more decent approch may consider conditioning on visible areas during training, rather than utilizing this text conditioned txt2img model to do the inpainting task, but this model is so far the best we can get our hands on : )

python scripts/txt2img.py --prompt "a cat is wearing a green hat" --image_prompt test_inputs/images/images.jpeg --mask_prompt test_inputs/masks/images.png --ddim_eta 0.0 --n_samples 8 --n_iter 4 --scale 5.0  --ddim_steps 50

images_inpainting

Hope it helps, cheers

For the complete code of txt2img.py, i made a pull request below, see

https://github.com/CompVis/latent-diffusion/pull/57

hyungkwonko commented 2 years ago

Hey @lxj616, That looks pretty awesome! Thanks for your kind reply & great work. Will try and share my own one.