how does cond_weight parameter work?

TencentARC / MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing

https://ljzycmd.github.io/projects/MasaCtrl/

Apache License 2.0

734 stars 26 forks source link

how does cond_weight parameter work? #24

Closed Ragzz258 closed 1 year ago

Ragzz258 commented 1 year ago

I was using depth as the condition to generate output image based on source image and target depth

python masactrl_w_adapter.py --src_img_path good_cat/women/src/women_blazers_6.jpg --cond_path_src good_cat/women/src_depth/women_blazers_6.png --cond_path good_cat/women/target_depth/1.png --cond_inp_type depth --prompt_src "" --prompt "" --sd_ckpt models/cr.ckpt --resize_short_edge 512 --cond_tau 1.0 --cond_weight 0 --n_samples 1 --which_cond depth --adapter_ckpt models/t2iadapter_depth_sd14v1.pth --outdir ./yo

Note that --cond_weight is 0 so it shouldn't use depth and give result same as reconstructed image but output is something else

reconstructed and final output 00006_all_result

why is it happening am I missing something?

ljzycmd commented 1 year ago

Hi @Ragzz258, I have tested the cond_weight=0 for image synthesis with canny and sketch guidance under your setting, and the output images are identical.

You may upload the above source image and the depth image and I can test your case.

Ragzz258 commented 1 year ago

Hi @ljzycmd , Thank you for the quick reply

source image women_blazers_6

target depth

I am using the code in the following PR https://github.com/TencentARC/MasaCtrl/pull/25

Ragzz258 commented 1 year ago

Another example

00057_all_result

source image women_pyjamas_5

target image

ljzycmd commented 1 year ago

Hi @Ragzz258, I have checked your code. The reason attributed to the reconstruction difference is that you add intermediate latent querying into the sampling process. In this way, note that the reconstructed source z_0 is only denoised from z_1 (this z_1 is obtained by the DDIM inversion), rather than z_T. While for the target image synthesis branch, the z_0 is iteratively denoised from z_0. As a result, the reconstructed images still differ slightly under your setting, although no additional guidance is applied.

Ragzz258 commented 1 year ago

Hi @ljzycmd, Thank you for the quick reply

I modified my code referring to your comment https://github.com/TencentARC/MasaCtrl/issues/20#issuecomment-1590462626 do I need to add anything else to the target image synthesis branch like I did in inversion or is it something else

kunalgoyal9 commented 1 year ago

Hi @ljzycmd, Thank you for the quick reply

I modified my code referring to your comment #20 (comment) do I need to add anything else to the target image synthesis branch like I did in inversion or is it something else

Hi @ljzycmd, Can you please guide me as well on this issue?

ljzycmd commented 1 year ago

Hi @Ragzz258, @kunalgoyal9, I further created a repo (https://github.com/ljzycmd/T2I-Adapter-w-MasaCtrl) that integrates MasaCtrl into T2I-Adapter to achieve the results shown in #20.

Hope this can help.

Ragzz258 commented 1 year ago

Hi @ljzycmd , thank you for the response

I am getting same results as I got using my code 00000_all_result

I found that the issue is due to the model used --sd_ckpt models/cr.ckpt (cyber realistic v14 model) and some other new models. but it is working well with models in the repo(sd v14, anythingv4 and sd v15)

00011_all_result

please let me know why it's happening Thank you for the great work 🙌

ljzycmd commented 1 year ago

Hi @Ragzz258, as I mentioned before https://github.com/TencentARC/MasaCtrl/issues/24#issuecomment-1612416474, the reconstructed images should differ due to the different denoising paths under your setting. The differences between the reconstructed images are highly correlated with the denoising UNet. Therefore, different checkpoints would reconstruct different images. You can see that it is still different between the two images reconstructed with SD1.4 is shown as follows:

difference