adobe-research / custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
https://www.cs.cmu.edu/~custom-diffusion
Other
1.87k stars 139 forks source link

why my optimization based weights merging is not work? #88

Open ZZZBBBZZZ opened 10 months ago

ZZZBBBZZZ commented 10 months ago

I have made Single-Concept Fine-tuning for cats, dogs, and wooden pot respectively.They performed very well. image image image

But when I wanted to integrate two concepts, the result was not ideal. Firstly, there are cats and dogs. When my prompt is "the \<new1> cat play a ball with a \<new2> dog", there is no dog in the picture. Here are my training commands and results.

python src/composenW.py --paths logs/2024-01-18T21-51-09_cat-sdv4/checkpoints/delta_epoch\=000002.ckpt+logs/2024-01-18T23-25-52_dog-sdv4/checkpoints/delta_epoch\=000001.ckpt --categories  "cat+dog"  --ckpt ./models/sd-v1-4.ckpt
##sample
python sample.py --prompt "the <new1> cat play a ball with a <new2> dog" --delta_ckpt optimized_logs/optimized_cat+dog/checkpoints/delta_epoch\=000000.ckpt --ckpt ./models/sd-v1-4.ckpt

image

Afterwards, I tried to merge cats and wooden pot, but when my prompt was "the \<new2> cat sculpture in the style of a \<new1> wooden pot", the results were not ideal. The following are the training commands and results.

python src/composenW.py --paths logs/2024-01-22T15-11-17_wooden_pot-sdv4/checkpoints/delta_epoch=000002.ckpt+logs/2024-01-18T21-51-09_cat-sdv4/checkpoints/delta_epoch\=000000.ckpt --categories  "wooden_pot+cat"  --ckpt ./models/sd-v1-4.ckpt
##sample
python sample.py --prompt "the <new2> cat sculpture in the style of a <new1> wooden pot" --delta_ckpt optimized_logs/optimized_wooden_pot+cat/checkpoints/delta_epoch=000000.ckpt --ckpt ./models/sd-v1-4.ckpt

image

Did I make a mistake somewhere, and why is this result not quite correct?

zaczywy commented 9 months ago

Hi, I also encountered the same problem when reproducing. It is worth noting that SD-1.4 itself performs poorly in handling the generation of multiple concepts, only presenting images of a single concept. Therefore, I think that the multi-concept images in the paper are the result of changing the base model or carefully selecting. Of course, this does not affect the innovation of the paper.

zaczywy commented 9 months ago

However, the fused model is effective for generating individual concepts within it.

ZZZBBBZZZ commented 9 months ago

Hi, I also encountered the same problem when reproducing. It is worth noting that SD-1.4 itself performs poorly in handling the generation of multiple concepts, only presenting images of a single concept. Therefore, I believe that the multi-concept images in the paper are the result of changing the base model or carefully selecting. Of course, this does not affect the innovation of the paper.

I agree with you.

nupurkmr9 commented 9 months ago

Hi @ZZZBBBZZZ , thanks for the interest in our work. Regarding composing cat and dog concept, we also found it to be difficult as we discuss in our paper too. Compositions of similar category items which stable diffusion pertained models struggle with are even more difficult after customization.

Regarding composing cat and wooden pot model, I believe composition results should be better. Maybe, you can also try composing our individual models we provided here or try the composed model that was trained jointly here.

Probably training each individual model for longer iterations can help. We trained each single concept model for 250 iterations on 2 GPUs with 4 batch-size per GPU.

Hopefully this helps. Thanks!!