drboog / ProFusion

Code for Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
Apache License 2.0
463 stars 29 forks source link

multiple subject #2

Open SlZeroth opened 1 year ago

SlZeroth commented 1 year ago

I want to create two people simultaneously, is multiple subjects possible?

drboog commented 1 year ago

If you want generate something like, "an image of A and B shaking hands", after fine-tuning the model on photos of A and B. Then I think revision of pipeline_stable_diffusion_promptnet.py is needed. Current implementation only generates conditioned on prompt like "a photo of S" or conditioned on a list ["a photo of $S_1^ $", "a photo of $S_2^ $", ... ], which is different from "a photo of $S_1^ \text{ and } S_2^* $".

SlZeroth commented 1 year ago

thank you for answer !

Is it possible to add multiple tokens while keeping promptnet technology intact?

I'm trying to make multi-token possible by modifying the source code as you posted in your reply, but before that, I wonder if this is easily possible for you.

drboog commented 1 year ago

I may update the code later, but I'm not sure how the performance will be.

SlZeroth commented 1 year ago

@drboog Thank you so much!

drboog commented 1 year ago

I wrote an implementation and tested it on my local machine. Unfortunately, the performance is not satisfying. For example, when we ask it to generate a photo of A and B shaking hands, it for sure generates an image of two people shaking hands. However, each one of these two people looks like a combination of A and B. But what we expect is one person looks like A, the other person looks like B. This is an interesting topic, I will think about improvement (on method or trick) in the future.

SlZeroth commented 1 year ago

thank you for trying. I have read the C-LORA paper, and it seems to address issues related to learning when dealing with the same type of subject. https://arxiv.org/abs/2304.06027