Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
16.42k stars 1.91k forks source link

[ControlNet 1.1] The updating track. #736

Closed lllyasviel closed 1 year ago

lllyasviel commented 1 year ago

We will use this repo to track some discussions for updating to ControlNet 1.1.

lllyasviel commented 1 year ago

Update: ControlNet 1.1 is released here.

lllyasviel commented 1 year ago

I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.

continue-revolution commented 1 year ago

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning to connect my extension to your inpainting model.

So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

Edit on 2023/04/18 already connected. Checkout my extension readme for how to use.

pkuliyi2015 commented 1 year ago

I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I have been long working on tiles. Have you tried cooperating with noise inversion tricks? I think this can be very good, with a better trained model it may be comparable to the quality of GigaGAN.

My extension is here -> https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

I will adapt your tile model to see the result and update it here.

2blackbar commented 1 year ago

Yes the tile model can be a saviour for upscaling and no doubles

halr9000 commented 1 year ago

This thread is already amazing. ^ 3 amazing devs collaborating

lllyasviel commented 1 year ago

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model.

So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.

pkuliyi2015 commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content.

However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.

Some initial observations:

See here for one result: https://imgsli.com/MTY5ODQw

2blackbar commented 1 year ago

what preprocessor we should use with tile controlnet model ? Using it without preprocessor gets "some" results but the resolution is kinda lower than if i would inpaint with 0.55 denoise, have to use cfg 2-3

continue-revolution commented 1 year ago

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model. So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.

Do you think there is a need to wait for an update of this extension? Is the current extension compatible with the new models, especially the inpainting model?

lllyasviel commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content.

However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.

Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

pkuliyi2015 commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

ProGamerGov commented 1 year ago

Is there a PR in this repo yet for implementing ControlNet v1.1?

lllyasviel commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

from the result it looks like your input image is bigger than h/8 w/8.

for example, if you diffuse at 512 512, your tile need to be 64 64 and then use 3 cv2. pyrup to interpolate to 512.

or you can add a gaussion blur to the inputs to make it smoother

lllyasviel commented 1 year ago

Hi I have a recommended list of updates:

Control Model: Implement the global average pooling before injection – read "global_average_pooling" item in the yaml file.

Depth: Rename “depth” to “depth_midas” “depth_leres” is already good Add “depth_zoe”

Normal: Add “normal_bae” Remove previous “normal” (or rename it to “normal_midas”)

Canny/MLSD: already good

Scribble: Rename “fake_scribble” to “scribble_hed” Add “scribble_pidi” Remove “scribble” (it seems that this one is just binarize, sounds confusing, or just "threshold"?)

SoftEdge: Rename “HED” to “softedge_hed” Add “softedge_pidi” Add “softedge_hedsafe”, “softedge_pidisafe” Rename “pidinet” to “sketch_t2iadapter”

Segmentation: Rename “seg” to “seg_ufade20K” Add “seg_ofade20K ” and “seg_ofcoco”

Openpose: “openpose” is good Remove “openpose_hand” Add “openpose_full”

Lineart: Add “lineart” Add “lineart_coarse” Add “lineart_anime”

Shuffle: Add “shuffle”

What do you think?

catboxanon commented 1 year ago

That list looks good to me.

Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.

CCRcmcpe commented 1 year ago

PR WIP at https://github.com/Mikubill/sd-webui-controlnet/pull/742.

CCRcmcpe commented 1 year ago

Recently renaming of annotators caused some downstream developers unhappy. We can implement renamings as display name change instead of ID change which causes API breaks.

Also on naming, the annotator name should imply which cnet model should be used and vice versa.

lllyasviel commented 1 year ago

i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?

CCRcmcpe commented 1 year ago

The gradio part seems less than ideal. List items cannot show hover infos, at least, I tried the the DDIM sampler item in WebUI, it doesn't, however if you select it and hover on the selection box it shows.

lllyasviel commented 1 year ago

i mean like adding a gradio.label or something and show some desc text from model yaml after a model is loaded. besides i think for api it is ok to have alias names.

lllyasviel commented 1 year ago

if u think it is ok i will begin to work on all 14 yaml files

CCRcmcpe commented 1 year ago

What about the old cnets (prior to 1.1)? They have no isolated yamls. I think it's better to implement it at code level, which is also localization friendly. I will wait response from repo owner.

lllyasviel commented 1 year ago

old cnets can just use blank text. we can only show texts when desc is avaliable

lllyasviel commented 1 year ago

@Mikubill why invert always binarize images? image

lllyasviel commented 1 year ago

now I have to invert outside using photoshop on my own to use the lineart model image

CCRcmcpe commented 1 year ago

That is a known issue, will be fixed.

halr9000 commented 1 year ago

Would be awesome to auto select the most likely model after preproc is selected & vice versa. Won't prevent people from changing, but will save a needed step 90% of the time.

On Thu, Apr 13, 2023 at 3:30 PM lllyasviel @.***> wrote:

i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?

— Reply to this email directly, view it on GitHub https://github.com/Mikubill/sd-webui-controlnet/issues/736#issuecomment-1507504870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEQMXRITOKXTD7KL6AY53XBBH4FANCNFSM6AAAAAAW4RNXZE . You are receiving this because you commented.Message ID: @.***>

2blackbar commented 1 year ago

that anime colorise works nice even at 1080x1080 res and it works not only for anime stuff but works best with animelike models , this is 768
image image

lllyasviel commented 1 year ago

That list looks good to me.

Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.

yes ip2p is very experimental. it is a model marked as [e].

But this model should be at least as robust as original ip2p. But seems that original ip2p is also not very robust.

Perhaps we can improve it by putting original image also in i2i and use the "denoising strength" to improve the robustness.

bropines commented 1 year ago

@Mikubill why invert always binarize images? image

How do I put this version instead of the main branch?

2blackbar commented 1 year ago

you dont, you just use lineart anime model without annotator and you paste your lineart image into controlnet then tick invert input color and thats it but it is better to be used in img2img you can control colours better if you have color template

bropines commented 1 year ago

When you turn on the LineArt anime preprocessor, there's a bug like this

image


Loading model from cache: control_v11p_sd15s2_lineart_anime [3825e83e]██████████████████████████████| 30/30 [00:18<00:00,  1.62it/s]
Loading preprocessor: lineart_anime
Error running process: G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py
Traceback (most recent call last):
  File "G:\stable-diffusion-portable-main\modules\scripts.py", line 417, in process
    script.process(p, *script_args)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py", line 735, in process
    detected_map, is_image = preprocessor(input_image, res=unit.processor_res, thr_a=unit.threshold_a, thr_b=unit.threshold_b)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\processor.py", line 276, in lineart_anime
    result = model_lineart_anime(img)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 154, in __call__
    line = self.model(image_feed)[0, 0] * 127.5 + 127.5
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 41, in forward
    return self.model(input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 108, in forward
    return self.model(x)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lora(self, input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
bropines commented 1 year ago

When you turn on the LineArt anime preprocessor, there's a bug like this

image

Loading model from cache: control_v11p_sd15s2_lineart_anime [3825e83e]██████████████████████████████| 30/30 [00:18<00:00,  1.62it/s]
Loading preprocessor: lineart_anime
Error running process: G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py
Traceback (most recent call last):
  File "G:\stable-diffusion-portable-main\modules\scripts.py", line 417, in process
    script.process(p, *script_args)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py", line 735, in process
    detected_map, is_image = preprocessor(input_image, res=unit.processor_res, thr_a=unit.threshold_a, thr_b=unit.threshold_b)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\processor.py", line 276, in lineart_anime
    result = model_lineart_anime(img)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 154, in __call__
    line = self.model(image_feed)[0, 0] * 127.5 + 127.5
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 41, in forward
    return self.model(input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 108, in forward
    return self.model(x)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "G:\stable-diffusion-portable-main\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lora(self, input)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

image

The first time it starts up, it gives this miracle, and then it goes to a permanent error. Torch 2.0. Cuda 11.8. xformers 0.0.18 image

catboxanon commented 1 year ago

But this model should be at least as robust as original ip2p. But seems that original ip2p is also not very robust.

Yeah, I was thinking of this as well. It works but even when using the original ip2p, even that gave me mixed results and didn't impress me very much.

Perhaps we can improve it by putting original image also in i2i and use the "denoising strength" to improve the robustness.

This is a good idea though... I'll have to play around more with it.

catboxanon commented 1 year ago

@bropines Did you read the comment on the PR? The preprocessor has conflicts with the web UI's native LoRA extension currently. https://github.com/Mikubill/sd-webui-controlnet/pull/742#issuecomment-1507504286

Currently sketch_anime annotator has conflicts with the built-in LoRA extension of A41 webui because it hooks torch.Linear and torch.Conv2d.forward. I plan to solve this later.

2blackbar commented 1 year ago

i think "annotator" for anime is just this repo https://github.com/Mukosame/Anime2Sketch

I made a gui for anime2sketch, you need to put it in anime2sketch folder and you can convert images to sketches But, it would of course be nicer if controlnet would borrow some code from anime2sketch to do it inside auto webui image

a2sgui.zip

catboxanon commented 1 year ago

But, it would of course be nicer if controlnet would borrow some code from anime2sketch to do it inside auto webui

I mean... that's part of the PR.

https://github.com/Mikubill/sd-webui-controlnet/blob/2cca805b2aa6befd35b01c444006c79fbe3163f1/annotator/lineart_anime/__init__.py#L113-L121

But again, refer to my referenced comment above -- it conflicts with the built-in LoRA extension at the moment. If you disable that extension then it should work fine. (Of course it'll work better once the invert issue is fixed as well)

lllyasviel commented 1 year ago

Update

2023/04/14:

72 hours ago we uploaded a wrong model "control_v11p_sd15_depth" by mistake. That model is an intermediate checkpoint during the training. That model is not converged and may cause distortion in results. We uploaded the correct depth model as "control_v11f1p_sd15_depth". The "f1" means bug fix 1. The incorrect model is removed. Sorry for the inconvenience.

bropines commented 1 year ago

Update

2023/04/14:

72 hours ago we uploaded a wrong model "control_v11p_sd15_depth" by mistake. That model is an intermedia checkpoint during the training. That model is not converged and may cause distortion in results. We uploaded the correct depth model as "control_v11f1p_sd15_depth". The "f1" means bug fix 1. The incorrect model is removed. Sorry for the inconvenience.

I thought it shouldn't work. Sorry, I should have written(

lllyasviel commented 1 year ago

Update

I updated more detailed descriptions. https://github.com/lllyasviel/ControlNet-v1-1-nightly

lllyasviel commented 1 year ago

Interesting results from cn11 ip2p

make it on fire, high-quality, extremely detailed Negative prompt: long body, low resolution, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: abcaf14e5a, Model: anything-v3-full, Denoising strength: 0.3, Clip skip: 2, ControlNet-0 Enabled: True, ControlNet-0 Module: none, ControlNet-0 Model: control_v11e_sd15_ip2p [c4bb465c], ControlNet-0 Weight: 1, ControlNet-0 Guidance Start: 0, ControlNet-0 Guidance End: 1, Hires upscale: 2, Hires steps: 20, Hires upscaler: R-ESRGAN 4x+ Anime6B

input image

output image

lllyasviel commented 1 year ago

it seems that in webui, the ip2p needs higher control weight if it does not work. very experimental

Kanareika commented 1 year ago

Now, the "Shuffle" preprocessor shuffles the image differently, even if the same seed is used. Wouldn't it be better to make it dependent on seed to ensure repeatability of the result?

lllyasviel commented 1 year ago

Now, the "Shuffle" preprocessor shuffles the image differently, even if the same seed is used. Wouldn't it be better to make it dependent on seed to ensure repeatability of the result?

fixed

lllyasviel commented 1 year ago

hi @Mikubill great works! some problems (1) when global average pooling is True, the controlnet should be only put on the cfg conditional side, otherwise the shuffle wont work very well. (2) it seem that the previous "using only mid control for high-res" is broken of deleted?

lbeltrame commented 1 year ago

Can the model differences be extracted in the same way as before? Asking because the original ControlNet models are pretty heavy in filesize especially if (like I do) you use many of them for multi-controlnet.

lllyasviel commented 1 year ago

@Mikubill (3) note that the two new lineart controlnets still use 0 as bg and 1 as line, means rightnow it does not work image

lllyasviel commented 1 year ago

(4) depth_midas should use 512 as default resolution (rather than 384) since our models are improved image

lllyasviel commented 1 year ago

(5) it seems that the visualization of softpidi is inverted image what about we always use black bg and white line for all softedge and lineart preprocessors