Closed lllyasviel closed 1 year ago
Update: ControlNet 1.1 is released here.
I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.
The inpainting model may need more considerations in implementation and perhaps we just get other models first.
I’m the author of sd-webui-segment-anything and I am planning to connect my extension to your inpainting model.
So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?
Edit on 2023/04/18 already connected. Checkout my extension readme for how to use.
I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.
I have been long working on tiles. Have you tried cooperating with noise inversion tricks? I think this can be very good, with a better trained model it may be comparable to the quality of GigaGAN.
My extension is here -> https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111
I will adapt your tile model to see the result and update it here.
Yes the tile model can be a saviour for upscaling and no doubles
This thread is already amazing. ^ 3 amazing devs collaborating
The inpainting model may need more considerations in implementation and perhaps we just get other models first.
I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model.
So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?
my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.
The model works as expected in automatic1111 txt2img; it does generate the guided content.
However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.
Some initial observations:
See here for one result: https://imgsli.com/MTY5ODQw
what preprocessor we should use with tile controlnet model ? Using it without preprocessor gets "some" results but the resolution is kinda lower than if i would inpaint with 0.55 denoise, have to use cfg 2-3
The inpainting model may need more considerations in implementation and perhaps we just get other models first.
I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model. So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?
my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.
Do you think there is a need to wait for an update of this extension? Is the current extension compatible with the new models, especially the inpainting model?
The model works as expected in automatic1111 txt2img; it does generate the guided content.
However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.
Some initial observations:
- Severe ghost shadows and duplicated contours, regardless of tile overlaps
- Faded colors in txt2img (even if with 840000 VAEs)
- Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).
See here for one result: https://imgsli.com/MTY5ODQw
which one is cn11tile? left or right?
The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:
- Severe ghost shadows and duplicated contours, regardless of tile overlaps
- Faded colors in txt2img (even if with 840000 VAEs)
- Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).
See here for one result: https://imgsli.com/MTY5ODQw
which one is cn11tile? left or right?
The right one. I must have done something wrong. But until now I cannot fix it.
Is there a PR in this repo yet for implementing ControlNet v1.1?
The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:
- Severe ghost shadows and duplicated contours, regardless of tile overlaps
- Faded colors in txt2img (even if with 840000 VAEs)
- Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).
See here for one result: https://imgsli.com/MTY5ODQw
which one is cn11tile? left or right?
The right one. I must have done something wrong. But until now I cannot fix it.
from the result it looks like your input image is bigger than h/8 w/8.
for example, if you diffuse at 512 512, your tile need to be 64 64 and then use 3 cv2. pyrup to interpolate to 512.
or you can add a gaussion blur to the inputs to make it smoother
Hi I have a recommended list of updates:
Control Model: Implement the global average pooling before injection – read "global_average_pooling" item in the yaml file.
Depth: Rename “depth” to “depth_midas” “depth_leres” is already good Add “depth_zoe”
Normal: Add “normal_bae” Remove previous “normal” (or rename it to “normal_midas”)
Canny/MLSD: already good
Scribble: Rename “fake_scribble” to “scribble_hed” Add “scribble_pidi” Remove “scribble” (it seems that this one is just binarize, sounds confusing, or just "threshold"?)
SoftEdge: Rename “HED” to “softedge_hed” Add “softedge_pidi” Add “softedge_hedsafe”, “softedge_pidisafe” Rename “pidinet” to “sketch_t2iadapter”
Segmentation: Rename “seg” to “seg_ufade20K” Add “seg_ofade20K ” and “seg_ofcoco”
Openpose: “openpose” is good Remove “openpose_hand” Add “openpose_full”
Lineart: Add “lineart” Add “lineart_coarse” Add “lineart_anime”
Shuffle: Add “shuffle”
What do you think?
That list looks good to me.
Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.
Recently renaming of annotators caused some downstream developers unhappy. We can implement renamings as display name change instead of ID change which causes API breaks.
Also on naming, the annotator name should imply which cnet model should be used and vice versa.
i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?
The gradio part seems less than ideal. List items cannot show hover infos, at least, I tried the the DDIM sampler item in WebUI, it doesn't, however if you select it and hover on the selection box it shows.
i mean like adding a gradio.label or something and show some desc text from model yaml after a model is loaded. besides i think for api it is ok to have alias names.
if u think it is ok i will begin to work on all 14 yaml files
What about the old cnets (prior to 1.1)? They have no isolated yamls. I think it's better to implement it at code level, which is also localization friendly. I will wait response from repo owner.
old cnets can just use blank text. we can only show texts when desc is avaliable
@Mikubill why invert always binarize images?
now I have to invert outside using photoshop on my own to use the lineart model
That is a known issue, will be fixed.
Would be awesome to auto select the most likely model after preproc is selected & vice versa. Won't prevent people from changing, but will save a needed step 90% of the time.
On Thu, Apr 13, 2023 at 3:30 PM lllyasviel @.***> wrote:
i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?
— Reply to this email directly, view it on GitHub https://github.com/Mikubill/sd-webui-controlnet/issues/736#issuecomment-1507504870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEQMXRITOKXTD7KL6AY53XBBH4FANCNFSM6AAAAAAW4RNXZE . You are receiving this because you commented.Message ID: @.***>
that anime colorise works nice even at 1080x1080 res and it works not only for anime stuff but works best with animelike models , this is 768
That list looks good to me.
Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.
yes ip2p is very experimental. it is a model marked as [e].
But this model should be at least as robust as original ip2p. But seems that original ip2p is also not very robust.
Perhaps we can improve it by putting original image also in i2i and use the "denoising strength" to improve the robustness.
@Mikubill why invert always binarize images?
How do I put this version instead of the main branch?
you dont, you just use lineart anime model without annotator and you paste your lineart image into controlnet then tick invert input color and thats it but it is better to be used in img2img you can control colours better if you have color template
When you turn on the LineArt anime preprocessor, there's a bug like this
Loading model from cache: control_v11p_sd15s2_lineart_anime [3825e83e]██████████████████████████████| 30/30 [00:18<00:00, 1.62it/s]
Loading preprocessor: lineart_anime
Error running process: G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py
Traceback (most recent call last):
File "G:\stable-diffusion-portable-main\modules\scripts.py", line 417, in process
script.process(p, *script_args)
File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py", line 735, in process
detected_map, is_image = preprocessor(input_image, res=unit.processor_res, thr_a=unit.threshold_a, thr_b=unit.threshold_b)
File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\processor.py", line 276, in lineart_anime
result = model_lineart_anime(img)
File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 154, in __call__
line = self.model(image_feed)[0, 0] * 127.5 + 127.5
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 41, in forward
return self.model(input)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 108, in forward
return self.model(x)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\stable-diffusion-portable-main\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
return torch.nn.Conv2d_forward_before_lora(self, input)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
When you turn on the LineArt anime preprocessor, there's a bug like this
Loading model from cache: control_v11p_sd15s2_lineart_anime [3825e83e]██████████████████████████████| 30/30 [00:18<00:00, 1.62it/s] Loading preprocessor: lineart_anime Error running process: G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py Traceback (most recent call last): File "G:\stable-diffusion-portable-main\modules\scripts.py", line 417, in process script.process(p, *script_args) File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\controlnet.py", line 735, in process detected_map, is_image = preprocessor(input_image, res=unit.processor_res, thr_a=unit.threshold_a, thr_b=unit.threshold_b) File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\scripts\processor.py", line 276, in lineart_anime result = model_lineart_anime(img) File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 154, in __call__ line = self.model(image_feed)[0, 0] * 127.5 + 127.5 File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 41, in forward return self.model(input) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\stable-diffusion-portable-main\extensions\sd-webui-controlnet\annotator\lineart_anime\__init__.py", line 108, in forward return self.model(x) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\stable-diffusion-portable-main\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward return torch.nn.Conv2d_forward_before_lora(self, input) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "G:\stable-diffusion-portable-main\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
The first time it starts up, it gives this miracle, and then it goes to a permanent error. Torch 2.0. Cuda 11.8. xformers 0.0.18
But this model should be at least as robust as original ip2p. But seems that original ip2p is also not very robust.
Yeah, I was thinking of this as well. It works but even when using the original ip2p, even that gave me mixed results and didn't impress me very much.
Perhaps we can improve it by putting original image also in i2i and use the "denoising strength" to improve the robustness.
This is a good idea though... I'll have to play around more with it.
@bropines Did you read the comment on the PR? The preprocessor has conflicts with the web UI's native LoRA extension currently. https://github.com/Mikubill/sd-webui-controlnet/pull/742#issuecomment-1507504286
Currently sketch_anime annotator has conflicts with the built-in LoRA extension of A41 webui because it hooks torch.Linear and torch.Conv2d.forward. I plan to solve this later.
i think "annotator" for anime is just this repo https://github.com/Mukosame/Anime2Sketch
I made a gui for anime2sketch, you need to put it in anime2sketch folder and you can convert images to sketches
But, it would of course be nicer if controlnet would borrow some code from anime2sketch to do it inside auto webui
But, it would of course be nicer if controlnet would borrow some code from anime2sketch to do it inside auto webui
I mean... that's part of the PR.
But again, refer to my referenced comment above -- it conflicts with the built-in LoRA extension at the moment. If you disable that extension then it should work fine. (Of course it'll work better once the invert issue is fixed as well)
Update
2023/04/14:
72 hours ago we uploaded a wrong model "control_v11p_sd15_depth" by mistake. That model is an intermediate checkpoint during the training. That model is not converged and may cause distortion in results. We uploaded the correct depth model as "control_v11f1p_sd15_depth". The "f1" means bug fix 1. The incorrect model is removed. Sorry for the inconvenience.
Update
2023/04/14:
72 hours ago we uploaded a wrong model "control_v11p_sd15_depth" by mistake. That model is an intermedia checkpoint during the training. That model is not converged and may cause distortion in results. We uploaded the correct depth model as "control_v11f1p_sd15_depth". The "f1" means bug fix 1. The incorrect model is removed. Sorry for the inconvenience.
I thought it shouldn't work. Sorry, I should have written(
Update
I updated more detailed descriptions. https://github.com/lllyasviel/ControlNet-v1-1-nightly
Interesting results from cn11 ip2p
make it on fire, high-quality, extremely detailed Negative prompt: long body, low resolution, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: abcaf14e5a, Model: anything-v3-full, Denoising strength: 0.3, Clip skip: 2, ControlNet-0 Enabled: True, ControlNet-0 Module: none, ControlNet-0 Model: control_v11e_sd15_ip2p [c4bb465c], ControlNet-0 Weight: 1, ControlNet-0 Guidance Start: 0, ControlNet-0 Guidance End: 1, Hires upscale: 2, Hires steps: 20, Hires upscaler: R-ESRGAN 4x+ Anime6B
input
output
it seems that in webui, the ip2p needs higher control weight if it does not work. very experimental
Now, the "Shuffle" preprocessor shuffles the image differently, even if the same seed is used. Wouldn't it be better to make it dependent on seed to ensure repeatability of the result?
Now, the "Shuffle" preprocessor shuffles the image differently, even if the same seed is used. Wouldn't it be better to make it dependent on seed to ensure repeatability of the result?
fixed
hi @Mikubill great works! some problems (1) when global average pooling is True, the controlnet should be only put on the cfg conditional side, otherwise the shuffle wont work very well. (2) it seem that the previous "using only mid control for high-res" is broken of deleted?
Can the model differences be extracted in the same way as before? Asking because the original ControlNet models are pretty heavy in filesize especially if (like I do) you use many of them for multi-controlnet.
@Mikubill
(3) note that the two new lineart controlnets still use 0 as bg and 1 as line, means rightnow it does not work
(4) depth_midas should use 512 as default resolution (rather than 384) since our models are improved
(5) it seems that the visualization of softpidi is inverted
what about we always use black bg and white line for all softedge and lineart preprocessors
We will use this repo to track some discussions for updating to ControlNet 1.1.