Open ZZZBBBZZZ opened 10 months ago
Hi, I also encountered the same problem when reproducing. It is worth noting that SD-1.4 itself performs poorly in handling the generation of multiple concepts, only presenting images of a single concept. Therefore, I think that the multi-concept images in the paper are the result of changing the base model or carefully selecting. Of course, this does not affect the innovation of the paper.
However, the fused model is effective for generating individual concepts within it.
Hi, I also encountered the same problem when reproducing. It is worth noting that SD-1.4 itself performs poorly in handling the generation of multiple concepts, only presenting images of a single concept. Therefore, I believe that the multi-concept images in the paper are the result of changing the base model or carefully selecting. Of course, this does not affect the innovation of the paper.
I agree with you.
Hi @ZZZBBBZZZ , thanks for the interest in our work.
Regarding composing cat
and dog
concept, we also found it to be difficult as we discuss in our paper too. Compositions of similar category items which stable diffusion pertained models struggle with are even more difficult after customization.
Regarding composing cat
and wooden pot
model, I believe composition results should be better. Maybe, you can also try composing our individual models we provided here or try the composed model that was trained jointly here.
Probably training each individual model for longer iterations can help. We trained each single concept model for 250 iterations on 2 GPUs with 4 batch-size per GPU.
Hopefully this helps. Thanks!!
I have made Single-Concept Fine-tuning for cats, dogs, and wooden pot respectively.They performed very well.
But when I wanted to integrate two concepts, the result was not ideal. Firstly, there are cats and dogs. When my prompt is "the \<new1> cat play a ball with a \<new2> dog", there is no dog in the picture. Here are my training commands and results.
Afterwards, I tried to merge cats and wooden pot, but when my prompt was "the \<new2> cat sculpture in the style of a \<new1> wooden pot", the results were not ideal. The following are the training commands and results.
Did I make a mistake somewhere, and why is this result not quite correct?