JackAILab / ConsistentID

Customized ID Consistent for human
MIT License
726 stars 72 forks source link

What is start_merge_step for ? #39

Closed vuongminh1907 closed 1 week ago

vuongminh1907 commented 1 month ago

In this code: ( null_prompt_embeds, augmented_prompt_embeds, text_prompt_embeds, ) = prompt_embeds.chunk(3) You use start_merge_step to use two of them for prompt_embeds in this code:

            if i <= start_merge_step:
                current_prompt_embeds = torch.cat(
                    [null_prompt_embeds, text_prompt_embeds], dim=0
                )
            else:
                current_prompt_embeds = torch.cat(
                    [null_prompt_embeds, augmented_prompt_embeds], dim=0
                )

All code in pipline_StableDiffusion_ConsistentID.py My question is : What are null_prompt_embeds, augmented_prompt_embeds, text_prompt_embeds meaning? And why you use start_merge_step ? Thanks so much <3

JackAILab commented 1 month ago

Hi, @vuongminh1907 start_merge_step is used to balance the influence of image token conditions and text prompt conditions during the inference process, details are list in the appendix of our paper.

Here, null_prompt_embeds represents negative prompt embedding (refer to L474), augmented_prompt_embeds represents the embedding of mixed image tokens and text prompts, and text_prompt_embeds represents the embedding of separate text prompts (refer to L505).