cubiq / ComfyUI_IPAdapter_plus

GNU General Public License v3.0
4.14k stars 313 forks source link

:face_with_spiral_eyes: Face Models Comparison and Suggestions #195

Closed cubiq closed 7 months ago

cubiq commented 10 months ago

:warning: Preliminary Data :warning:

Face Models Comparison

I started collecting data about all the face models available for IPAdapter. I'm generating thousands of images and comparing them with a face descriptor model. The result is subtracted to the original reference image. A value of 0 means 100% same person, 1.0 completely different.

BIAS! Important: please read!

The comparison is meant just as an overall help in choosing the right models. They are just numbers, they do not represent the actual image quality let alone the artistic value.

The face descriptor can be skewed by many factors and a face that is actually very good could get a low score for a number of reasons (head position, a weird shadow, ...). Don't take the following data as gospel, you still need to experiment.

Additionally the images are generated over a single pass of 30 steps. Better results could be probably achieved with a second pass and upscaling, but that would require a lot more time.

I think this data still has value to at least remove the worst offenders from your tests.

Round 1: skim the data

First step is to find the best performing checkpoints and IPAdapter face models (and face models combination). With that established we can move to the second phase which is running even more data concentrated on the best performers.

These are all the IPAdapter models that I've tested in random order, best performers are bold and will go to the next round.

These are the Checkpoints in random order, best performers are :trophy: bold.

Dreamshaper will be excluded from photo-realistic models but I will run it again with other "illustration" style checkpoints.

The preliminary data is available in a google sheet: https://docs.google.com/spreadsheets/d/1NhOBZbSPmtBY9p52PRFsSYj76XDDc65QjcRIhb8vfIE/edit?usp=sharing

Round 2: Refining the data

In this phase I took the best performers from the previous round and ran more tests. Best results bold

Basically more embeds, better results.

realisticVisionV51_v51VAE (NOT V6) Is overall the best performer but life like diffusion has often the single best result; meaning that the average is not as good as realistic vision, but sometimes you get that one result that is really good.

I tested both euclidean and 1-cosine and the result are surprisingly the same.

Since it seems that more embeddings give better results I'll also try to send multiple images of the same person to each model. I don't think it will help, but happy to be proven wrong.

The data for round 2 can be found here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

Preliminary SDXL

Combinations tested:

A the moment the best models seem to be:

Predictably V2+PlusFace again are the best performers. The best average is still .36.

Interestingly TurboVision XL performs very well.

Data: https://docs.google.com/spreadsheets/d/1hjiGB-QnKRYXTS6zTAuacRUfYUodUAdL6vZWTG4HZyc/edit?usp=sharing

Round 3: Testing multiple reference images

Processing...

Round 4: Higher resolution

Upscaling SD1.5 512×512 images is not advisable if you want to keep the likeliness as high as possible. Even using low denoise and high IPAdapter weight the base checkpoints are simply not good enough to keep the resemblance.

In my tests I lose about .5 likeliness after every upscale.

Fortunately you can still upscale SD1.5 models with SDXL FaceID + PlusFace (I used Juggernaut which is the best performer in the SDXL round). The results are very good. LifeLifeDiffusion and RealisticVision5 are still the best performers.

The average is still around 0.35 (which is lower than I'd like) but sometimes you get very good results (0.27), so it's worth running a few seeds and try with different reference images.

Result data here: https://docs.google.com/spreadsheets/d/1uVWJOcDxaEjRks-Lz0DE9A3DCCFX2qsvdpKi3bCSE2c/edit?usp=sharing

Methodology

I tried many libraries for feature extraction/face detection. In the aggregated results I find that the difference is relatively small, so at the moment I'm using Dlib and euclidean similarity. I'm trying to keep the generated images as close as possible in color/position/contrast to the original to have minimal skew.

I tried 1-consine and the results don't differ much from what is presented here so I take that the data is pretty strong. I will keep testing and update if there are any noticeable differences.

All primary embedding weights are set at .8, all secondary weights are set at .4.

xiaohu2015 commented 10 months ago

which face descriptor you used?

cubiq commented 10 months ago

I tried a few... we could run an average maybe? dlib, MTCNN, RetinaFace are decent and pretty fast. Insighface seems to be biased since you trained with that.

xiaohu2015 commented 10 months ago

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

cubiq commented 10 months ago

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much. image

Do you get vastly different results?

xiaohu2015 commented 10 months ago

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much. image

Do you get vastly different results?

FaceNet?

cubiq commented 10 months ago

yes, facenet. Again, I've tried a few options but the result seems more or less the same. FaceID Plus v2 at weight=2 is always at the top.

Interestingly FaceIDPlus and a second pass with PlusFace or FullFace is also very effective. That makes me think that there are more combinations that we haven't explored.

You seem very interested, I'm glad about that. Please feel free to share your experience/ideas if you want.

xiaohu2015 commented 10 months ago

yes, i am very interested, because a good metric is important to develop a good model.

you are right, you can also try FaceID + FaceID Plus

thresholds = { "VGG-Face": {"cosine": 0.40, "euclidean": 0.60, "euclidean_l2": 0.86}, "Facenet": {"cosine": 0.40, "euclidean": 10, "euclidean_l2": 0.80}, "Facenet512": {"cosine": 0.30, "euclidean": 23.56, "euclidean_l2": 1.04}, "ArcFace": {"cosine": 0.68, "euclidean": 4.15, "euclidean_l2": 1.13}, "Dlib": {"cosine": 0.07, "euclidean": 0.6, "euclidean_l2": 0.4}, "SFace": {"cosine": 0.593, "euclidean": 10.734, "euclidean_l2": 1.055}, "OpenFace": {"cosine": 0.10, "euclidean": 0.55, "euclidean_l2": 0.55}, "DeepFace": {"cosine": 0.23, "euclidean": 64, "euclidean_l2": 0.64}, "DeepID": {"cosine": 0.015, "euclidean": 45, "euclidean_l2": 0.17}, }

cubiq commented 10 months ago

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

xiaohu2015 commented 10 months ago

by the way, do you have some ideas or suggestions on improving the result, which maybe helpful to me.

xiaohu2015 commented 10 months ago

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

yes, from deepface repo

in fact, I found face ID embedding is very powerful, i think I should find better training tricks l.

cubiq commented 10 months ago

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

xiaohu2015 commented 10 months ago

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

xiaohu2015 commented 10 months ago

SDXL FaceID preview sdxl_faceid

in my benchmark,the cos similarity is a little better than sd 1.5 FaceID

cubiq commented 10 months ago

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

I've seen people send multiple images trying to increase the likeliness. I'm not convinced it actually works, there's a lot of bias in "face" recognition. I will run some tests, honestly I think it's laziness. I was able to reach 0.27 likeliness with a good combination of IPAdapter models at low resolution.

Combining 2 IPAdapter models I think it's more effective than sending multiple images to the same model. But I'll make some tests.

PS: looking forward to the SDXL model!

cubiq commented 10 months ago

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch :smile:

xiaohu2015 commented 10 months ago

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch 😄

it same as SD 1.5 FaceID: face embedding + LoRA

but I am not sure if SDXL version really better than the SD 1.5 version, because evaluation metrics are often unreliable

cubiq commented 10 months ago

okay I ran more tests, any combination of Plusv2 with any other model is definitely a winner.

These are all good:

The only other NOT v2 combination that seems to be working well is FaceIDPlus+FaceID.

I'll update the first post when I have more data

PS: I got a 0.26 today at low resolution! Looking forward to do some high resolution test :smile:

xiaohu2015 commented 10 months ago

I will update SDXL model now, you can also test it

xiaohu2015 commented 10 months ago

@cubiq update at https://huggingface.co/h94/IP-Adapter-FaceID#ip-adapter-faceid-sdxl

but you should convert the lora part

cubiq commented 10 months ago

great thanks!

I just updated the first post with new info. Data for round 2 is here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

I'll check SDXL later :smile: and run dedicated tests on it too.

cubiq commented 10 months ago

I just had a look at the key structure of the SDXL lora and it's a darn mess :smile: do you have a conversion mapping @xiaohu2015 ?

xiaohu2015 commented 10 months ago

https://github.com/cubiq/ComfyUI_IPAdapter_plus/issues/145#issuecomment-1865495779

I think we can refer to this. You can find a normal sdxl lora weight and load it, print its keys, then you can get diff2ckpt for sdxl

In the future version, lora should be not needed

cubiq commented 10 months ago

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that :smile:

xiaohu2015 commented 10 months ago

@laksjdjf can you help

xiaohu2015 commented 10 months ago

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that 😄

ok, I will also upload a lora weight next week

xiaohu2015 commented 10 months ago

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

cubiq commented 10 months ago

It seems to be working pretty well together with plusface, but results are a bit random (either very good or very bad). I'll run some stats on that too.

ComfyUI_temp_lffkp_00011_

reference image: theron

ultimatech-cn commented 10 months ago

It is really great work! I heard that a lot of people complain about similarity of double-chin face, big face, wearing glasses etc. Is there any test for these? Or some solution for these face shapes?

cubiq commented 10 months ago

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

unfortunately that is not correct @xiaohu2015 . I'll see if I can fix it in the coming days

xiaohu2015 commented 10 months ago

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

unfortunately that is not correct @xiaohu2015 . I'll see if I can fix it in the coming days

OK

xiaohu2015 commented 10 months ago

the lora weight file has been updated: https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors. It should work.

jepjoo commented 10 months ago

Can confirm that it works now. Thanks!

xiaohu2015 commented 10 months ago

Can confirm that it works now. Thanks!

maybe give some cases? 😄

jepjoo commented 10 months ago

Input image: sauli

Output, lora weight 1, FaceID weight 1: ComfyUI_temp_pqcll_00016_

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact): ComfyUI_temp_pqcll_00017_

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

xiaohu2015 commented 10 months ago

Input image: sauli

Output, lora weight 1, FaceID weight 1: ComfyUI_temp_pqcll_00016_

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact): ComfyUI_temp_pqcll_00017_

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

you should compare with sd 1.5 faceid. In fact, the face consistency should be better than sd 1.5.

Arron17 commented 10 months ago

Editing because the SDXL Lora wasn't setup properly in my workflow:

Some Quick Examples of what I've been getting, SDXL FaceID + SDXL Plus Face seems to work a little better than SD1.5 FaceID + SD1.5 Plus Face. (Both of these running with their respective LoRAs)

Input Image: image

SD1.5 FaceID + SD1.5 Plus Face: image

SDXL FaceID + SDXL Plus Face: image

SDXL FaceID on it's own: image

And then for reference an SD1.5 FaceID Plus V2: image

Arron17 commented 10 months ago

Here's another couple with a different model

Input Image: image

SDXL FaceID: image

SDXL FaceID + SDXL Plus Face: image

andieier commented 10 months ago

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

JorgeR81 commented 10 months ago

My best results are with FaceID SDXL ( with lora ) and Plus Face.

test01j

Arron17 commented 10 months ago

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

andieier commented 10 months ago

My best results are with FaceID SDXL ( with lora ) and Plus Face.

test01j

Very nice! What checkpoint/Loras do you use? And what was your example prompt? I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

andieier commented 10 months ago

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

Thank you! Gonna try it tomorrow when I am back home. The urge to try and find out is very big…..

JorgeR81 commented 10 months ago

Very nice! What checkpoint/Loras do you use? And what was your example prompt? I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

Face ID images are all Juggernaut XL 7, no loras ( except for the Face ID lora ), like on the example workflow. Juggernaut XL 8 does not work as well. Weights need to be a lot higher.

But Realism Engine SDXL v2 also worked well with Face ID. Version 3 just came out, so I haven't tried yet. https://civitai.com/models/152525/realism-engine-sdxl

The negative prompt is the one used on civitai by the Juggernaut XL creator. https://civitai.com/images/2612019

The positive prompt is from a Midjourney 6 vs 5.2 comparison video ( at 3.35 ) https://www.youtube.com/watch?v=Zl_4V0ks7CE

I think Face ID makes the SDXL results closer to v6, than v5.2. FaceID also improves skin tone and texture, and gives more complexity / realism to the facial features.

Without Face ID, these are best SDXL checkpoints for natural portraits: https://civitai.com/models/189109/photopedia-xl https://civitai.com/models/139565/realistic-stock-photo

As for Juggernaut XL, my favorite one is still version 5. It works well with this lora. https://civitai.com/models/170395/black-and-color

cubiq commented 10 months ago

I've run preliminary benchmarks on SDXL. I've updated the original post.

Best checkpoints: Juggernaut XL, Realism Engine.

SDXL FaceID is better than SD1.5 FaceID. The average is 0.37 vs 0.41 of SD1.5.

yuturiy commented 10 months ago

@cubiq can u pls describe the process of running this tests?

cubiq commented 10 months ago

what do you need to know, the generic process is explained in the "Methodology" paragraph above

xiaohu2015 commented 10 months ago

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

cubiq commented 10 months ago

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

xiaohu2015 commented 10 months ago

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

models: https://huggingface.co/h94/IP-Adapter-FaceID/resolve/main/ip-adapter-faceid-plusv2_sdxl.bin

it is same as faceid plus v2 sd 1.5 but for sdxl

cubiq commented 10 months ago

yes I can already tell that it's a lot better. Top FaceID, bottom FaceIDPlusV2 both with PlusFace added on top. I will launch some benchmarks later

v2-test