cubiq / ComfyUI_InstantID

Apache License 2.0
1.15k stars 61 forks source link

Multi-ID doesn't correctly put two separate faces in the same image. #20

Closed Starzilla29 closed 6 months ago

Starzilla29 commented 6 months ago

I have been messing with the Multi-ID workflow provided in this repo. I wanted to try it with a separate man and woman that had very different features from each other, the idea being I wanted to make sure I could use InstantID with multiple faces in the same images.

I modified the repo to have a fix seed and generate all images including the mask (for ease of testing instead of having to load a bunch of images from disk). I generated an image of an old man and a young adult blonde woman. I have also generated a face for face keypoints to use as a pose. Ideally I get the older man on the left side of the image and the woman on the right. however, using this workflow, with both apply instantID nodes I only get the woman on both sides. Even if I turn the weight all the way down for the woman, I still get the woman and never the man. However, removing the woman's apply instantID and respective controlnet allows the man to come through on just the left side as he is supposed to. Below is the workflow I am using: Multi-ID-Testing-Workflow.json

The three generated references images from the workflow to be used as inputs: ComfyUI_temp_ftcsi_00004_ ComfyUI_temp_indeu_00001_ ComfyUI_temp_hirty_00004_

The results of the workflow: ComfyUI_temp_lfnya_00031_ ComfyUI_temp_lfnya_00032_

Based on the behavior I am seeing, it looks like the last apply InstantID is completely overriding the first. I believe this to be a bug, unless there is something completely wrong that I missed that I am supposed to be doing that I am not. However, this is basically the workflow provided with the exception that the images and mask have been changed to be generated from the network and a resolution change.

Enzzer commented 6 months ago

I'm having the same issue - the second instantid+controlnet nodes (of the second face) in the conditioning chain override the first instantid+controlnet nodes (of the first face). The result is that both controlnet nodes that control the location of the faces end up generating the same face.

cubiq commented 6 months ago

the workflow presented by OP, has various issues. The comfyui controlnet node doesn't have a mask function so there will be some bleeding, there are some workarounds, we'll discuss this in the future but for now let's leave this as "only partially supported"

cubiq commented 6 months ago

It's complicated but possible. I've updated the multi-id workflow in the repository. The workflow can be further improved by giving a better context to the whole composition (with an additional prompt or controlnet)

ComfyUI_temp_sfqgh_00061_

syguan96 commented 6 months ago

Hi @cubiq , I reviewed the official code and did some experiments. I think the actual procedure is: Given two reference image (providing ID),

  1. firstly, do Attention masking.
  2. then, draw the two facial landmarks in one picture and input to the controlnet.
cubiq commented 6 months ago

I don't think that masking the attention is enough because the embeds are also in the controlnet, anyway I have an idea to simplify the workflow. I'll work on it when I have time

julien-blanchon commented 6 months ago

I don't think that masking the attention is enough because the embeds are also in the controlnet, anyway I have an idea to simplify the workflow. I'll work on it when I have time

Do you have some insight for doing that in a image2image fashion (style transfert) ? My current approach is to detect all the face in the image and make a mask using GroundingDINO of SegCLIP. And than apply InstantID with attention masking on each corresponding mask