Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
16.57k stars 1.92k forks source link

[ControlNet 1.1] The updating track. #736

Closed lllyasviel closed 1 year ago

lllyasviel commented 1 year ago

We will use this repo to track some discussions for updating to ControlNet 1.1.

MadaraxUchiha88 commented 1 year ago

Also, thank you guys for all that you're doing and all your hard work :) I'm sorry for cluttering this with my issue :( I'd make a separate bug report but I got confused with all the questions. I've had a very long day at work and was looking forward to updating and using Shuffle and stuff but then all the errors happened :(

lllyasviel commented 1 year ago

fixed

lllyasviel commented 1 year ago

should work now

MadaraxUchiha88 commented 1 year ago

But is Shuffle still not going to work for me?

lllyasviel commented 1 year ago

shuffle is significantly better

This is style image

This is shuffle image

but it seems that shuffle does not work with --lowvram or --midvram now

will fix later

lllyasviel commented 1 year ago

a1111 "--lowvram" uses a special input shape. usually the input is [2, 4, 64, 64] but "--lowvram" use two [1, 4, 64, 64] and controlnet do not know which [1, 4, 64, 64] is uncond. it is more difficult to handle

I use a complex hack trick to determine the actual cond and uncond. Here on the 191 line: https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111/blob/main/tile_methods/abstractdiffusion.py

can you describe a bit more?

pkuliyi2015 commented 1 year ago

In a nutshell automatic deal with cond and uncond in three situations:

  1. Only when condition and unconditioned tensors are with the same length, and the user is not using low ram or medvram, then the condition an unconditioned images will be batched.
  2. If not, and medvram, these tensors will be sliced to batch_size
  3. If not, and lowvram, these tensors will be sliced to 1
lllyasviel commented 1 year ago

but how do i know whether a batch is un or c?

lllyasviel commented 1 year ago

perhaps I can track the x.shape[0] and process.batch_size and accumulate a counter?

pkuliyi2015 commented 1 year ago

My solution is a little bit cumbersome. I hooked on the forward and the maintain a set of variables to count the time of calling

lllyasviel commented 1 year ago

interesting

pkuliyi2015 commented 1 year ago

Yeah, that is the basic idea. Otherwise your script will break when the user use 75 words of positive and 150 of negative prompt.

ljleb commented 1 year ago

Counting the number of forward passes does not work well with other extensions like multidiffusion, as they do multiple forward passes for a single image. Possible it works if you are counting only to determine cond vs uncond though.

MadaraxUchiha88 commented 1 year ago

I want to bring this to your attention also: I went back to the old commit before the update, used Shuffle with no preprocessor, and I get results like this:

This is the source image and the one after is my image from a prompt

img_1717

Gaston (Dreambooth) 23

lllyasviel commented 1 year ago

hell. is it impossible to get c/uc flag in a1111?

pkuliyi2015 commented 1 year ago

Counting the number of forward passes does not work well with other extensions like multidiffusion, as they do multiple forward passes for a single image. Possible it works if you are counting only to determine cond vs uncond though.

No, I am counting the UNet. For controlnet, you only need to count the CFG denoiser then it is very simple.

lllyasviel commented 1 year ago

Counting the number of forward passes does not work well with other extensions like multidiffusion, as they do multiple forward passes for a single image. Possible it works if you are counting only to determine cond vs uncond though.

No, I am counting the UNet. For controlnet, you only need to count the CFG denoiser then it is very simple.

can you tell a bit more?

pkuliyi2015 commented 1 year ago

The concrete logic is in this link 119 line:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_samplers_kdiffusion.py

The trick is very, very easy that if the oncfgdenoiser call back is not called you are still in the same batch.

lllyasviel commented 1 year ago

what about direct read the int32 diffusion time step and count unique values?

pkuliyi2015 commented 1 year ago

Yes, that should also work stably.

lllyasviel commented 1 year ago

but we do not know the batchsize so we do not know how many step is a shift

lllyasviel commented 1 year ago

oh perhaps batchsize can be recorded from process call

zismylove commented 1 year ago

cmd : --deepdanbooru --listen --xformers --no-half-vae --enable-insecure-extension-access Preprocessor : lineart Model : control_v11p_sd15_canny[d14c016b] enable the GuessMode gpu4090

First error tell me " --lowvram don't work whit GuessMode",but I did not check the checkbox of lowvram. When I manually checked and then unchecked lowvram and run again, the following error occurred. Is this my problem? can it be reproduced?

Error completing request Arguments: ('task(qpebq3j4xlj2spj)', 'masterpiece, best quality,1 girl,(sketch art:1.02),black hair, delicate face,Delicate eyes, delicate hair,upper body, Chinese style,ancient times,court', '(((ugly))),(((duplicate))),((morbid)),((mutilated)),(((tranny))),mutated hands,(((poorly drawn hands))),blurry,((bad anatomy)),(((bad proportions))),extra limbs,cloned face,(((disfigured))),(((more than 2 nipples))),((((missing arms)))),(((extra legs))),mutated hands,(((((fused fingers))))),(((((too many fingers))))),(((unclear eyes))),lowers,bad anatomy,bad hands,text,error,missing fingers,extra digit,fewer digits,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,bad feet,text font ui,malformed hands,long neck,missing limb,(mutated hand and finger: 1.5),(long body: 1.3),(mutation poorly drawn: 1.2),disfigured,malformed mutated,multiple breasts,futa,yaoi,extra limbs,(bad anatomy),gross proportions,(malformed limbs),((missing arms)),((missing legs)),(((extra arms))),(((extra legs))),mutated hands,(fused fingers),(too many fingers),(((long neck))),missing fingers,extra digit,fewer digits,bad feet,(bad anatomy),(bad hands),(text),((error)),(missing fingers),(extra digit),(fewer digits),(cropped),(worst quality),(low quality),(normal quality),(jpeg artifacts),(signature),(watermark),(username),(blurry),(missing arms),(long neck),(Humpbacked),(lowres),(too many fingers),(malformed hands),(three legs),(missing fingers),(mutilated),(multiple breasts),((extra limbs)),((worstquality)),((low quality)),(bad feet),(nude),((unclear eyes)),(cloned face),((worst quality)),((bad anatomy disfigured malformed mutated)),(malformed limbs)', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 7, 100, 'Constant', 0, 'Constant', 0, 4, <scripts.external_code.ControlNetUnit object at 0x00000188328F39A0>, <scripts.external_code.ControlNetUnit object at 0x00000187D32E5210>, <scripts.external_code.ControlNetUnit object at 0x00000187D32E52A0>, <scripts.external_code.ControlNetUnit object at 0x00000187D32E5330>, <scripts.external_code.ControlNetUnit object at 0x00000187D32E53C0>, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, False, False, False, '#000000', None, False, None, False, None, False, None, False, None, False, 50) {} Traceback (most recent call last): File "H:\diffusion\stableDiffusion\modules\call_queue.py", line 56, in f res = list(func(*args, kwargs)) File "H:\diffusion\stableDiffusion\modules\call_queue.py", line 37, in f res = func(*args, *kwargs) File "H:\diffusion\stableDiffusion\modules\txt2img.py", line 56, in txt2img processed = process_images(p) File "H:\diffusion\stableDiffusion\modules\processing.py", line 503, in process_images res = process_images_inner(p) File "H:\diffusion\stableDiffusion\modules\processing.py", line 653, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "H:\diffusion\stableDiffusion\modules\processing.py", line 869, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "H:\diffusion\stableDiffusion\modules\sd_samplers_kdiffusion.py", line 358, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "H:\diffusion\stableDiffusion\modules\sd_samplers_kdiffusion.py", line 234, in launch_sampling return func() File "H:\diffusion\stableDiffusion\modules\sd_samplers_kdiffusion.py", line 358, in samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={ File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "H:\diffusion\stableDiffusion\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral denoised = model(x, sigmas[i] * s_in, extra_args) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "H:\diffusion\stableDiffusion\modules\sd_samplers_kdiffusion.py", line 145, in forward x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b])) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "H:\diffusion\stableDiffusion\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), kwargs) File "H:\diffusion\stableDiffusion\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps return self.inner_model.apply_model(*args, *kwargs) File "H:\diffusion\stableDiffusion\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs)) File "H:\diffusion\stableDiffusion\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(args, kwargs) File "H:\diffusion\stableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, cond) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "H:\diffusion\stableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward out = self.diffusion_model(x, t, context=cc) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "H:\diffusion\stableDiffusion\extensions\sd-webui-controlnet\scripts\hook.py", line 255, in forward2 return forward(*args, *kwargs) File "H:\diffusion\stableDiffusion\extensions\sd-webui-controlnet\scripts\hook.py", line 183, in forward control = param.control_model(x=x_in, hint=param.used_hint_cond, timesteps=timesteps, context=context) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "H:\diffusion\stableDiffusion\extensions\sd-webui-controlnet\scripts\cldm.py", line 115, in forward return self.control_model(*args, kwargs) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "H:\diffusion\stableDiffusion\extensions\sd-webui-controlnet\scripts\cldm.py", line 368, in forward emb = self.time_embed(t_emb) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "H:\diffusion\stableDiffusion\extensions-builtin\Lora\lora.py", line 307, in lora_Linear_forward return torch.nn.Linear_forward_before_lora(self, input) File "H:\diffusion\stableDiffusion\py310\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

lllyasviel commented 1 year ago

but do we have cases that different steps have same diffusion time step?

pkuliyi2015 commented 1 year ago

No, I think not but if you want to make sure that we don’t have, you can keep track of the call back function. That will only be called once for each denoising step.

lllyasviel commented 1 year ago

@zismylove how many vram do you use?

pkuliyi2015 commented 1 year ago

It is as simple as, when each time that is called, you set every counter to zero. and when you meet different diffusion times steps you just added the tensor shape[0] to your counter.

annasophiachristianahahn commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

Is there a way to use controlnet tile in the public distribution of automatic1111 yet?

pkuliyi2015 commented 1 year ago

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

Is there a way to use controlnet tile in the public distribution of automatic1111 yet?

No, I am also looking forward to use that

pkuliyi2015 commented 1 year ago

This can be extremely suitable for my multidiffusion to do upscaling.

lllyasviel commented 1 year ago

@Mikubill I think I will finish here today. can you take a look at the robustness of the new parts? I think it should be as robust as before but i am not 100% sure. some user reported bugs but bugs can not be reproduced and it seems the user report rate is similar to before.

pkuliyi2015 commented 1 year ago

I think if you‘re doing with the cond and uncond, you should have a try to use different positive and negative length (<75 and >150). This will definitely give you errors if your script is not properly dealing with the batch problem.

A deepNegative embedding can quickly use up your 75 tokens.

Mikubill commented 1 year ago

Great and fixes look nice. I will check around later and thanks for improvements!

lllyasviel commented 1 year ago

I find a magical flag that can fix this problem --always-batch-cond-uncond a1111 is really a mess

lllyasviel commented 1 year ago

Now --lowvram can use shuffle with --always-batch-cond-uncond https://github.com/Mikubill/sd-webui-controlnet/commit/ede9e2c379919cac7ce96581a36076057e8242c2

NakiriRuri commented 1 year ago

Which model should be used to control the results generated by the binary preprocessor? I tried part of the model and the results look a bit subtle. new image xyz_grid-0015-20230416154312

catboxanon commented 1 year ago

I believe it's just an alternative that can be used with the scribble model. https://github.com/Mikubill/sd-webui-controlnet/pull/495

lllyasviel commented 1 year ago

I am also confused what is binary and scribble_thr

Why are they different and when do I need these?

Note that scribble 1.1 can receive slightly non binary inputs, do we really need these two?

ghost commented 1 year ago

edit: now working after clean install of everything

Does anyone else get error with seg_ofcoco and sg-ofade20k, or just me? (I'm still on torch: 1.13.1+cu117) _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

catboxanon commented 1 year ago

Download might be corrupted? Try deleting stable-diffusion-webui\models\oneformer\150_16_swin_l_oneformer_coco_100ep.pth and let the script re-downloaded it. You could also try manually downloading it. https://huggingface.co/lllyasviel/Annotators/resolve/main/150_16_swin_l_oneformer_coco_100ep.pth

ghost commented 1 year ago

Download might be corrupted? Try deleting stable-diffusion-webui\models\oneformer\150_16_swin_l_oneformer_coco_100ep.pth and let the script re-downloaded it. You could also try manually downloading it. https://huggingface.co/lllyasviel/Annotators/resolve/main/150_16_swin_l_oneformer_coco_100ep.pth

thanks, I might try that, but SHA256 values are matching btw, do you use --disable-safe-unpickle option?

catboxanon commented 1 year ago

Hm, odd. Also no, not using that CLI argument and it works for me. Shouldn't be a problem either since all the imports in that pickled checkpoint are ones the web UI allows.

ghost commented 1 year ago

Hm, odd. Also no, not using that CLI argument and it works for me. Shouldn't be a problem either since all the imports in that pickled checkpoint are ones the web UI allows.

thanks, must be my torch version then

edit: fixed after clean install, torch version remained 1.13.1 but webui 22bcc7be seems to have fixed it

bropines commented 1 year ago

shuffle is significantly better

This is style image

This is shuffle image

but it seems that shuffle does not work with --lowvram or --midvram now

will fix later

In order to use shufle do I need to put the same extension on several control net or how does it work? What is this addition to the multiple control net in the control net tab?

Edit: I just poked around in all the settings and found. Sorry image

MadaraxUchiha88 commented 1 year ago

Now --lowvram can use shuffle with --always-batch-cond-uncond ede9e2c

Did you fix it? :o You got --lowvram to work with --always-batch-cond-uncond for Shuffle?

lllyasviel commented 1 year ago

Now --lowvram can use shuffle with --always-batch-cond-uncond ede9e2c

Did you fix it? :o You got --lowvram to work with --always-batch-cond-uncond for Shuffle?

yes

MadaraxUchiha88 commented 1 year ago

Now --lowvram can use shuffle with --always-batch-cond-uncond ede9e2c

Did you fix it? :o You got --lowvram to work with --always-batch-cond-uncond for Shuffle?

yes

You guys are amazing :D Thank you so much! Were you also able to fix the issue that Style was having?

djbielejeski commented 1 year ago

Can no longer load a pre-processor like so from another extension

controlnet_external_code = importlib.import_module('extensions.sd-webui-controlnet.scripts.external_code', 'external_code')
controlnet_global_state = importlib.import_module('extensions.sd-webui-controlnet.scripts.global_state', 'global_state')
controlnet_preprocessors = controlnet_global_state.cn_preprocessor_modules

controlnet_units = [
    cn_unit for cn_unit in controlnet_external_code.get_all_units_in_processing(p)
    if (cn_unit.enabled == True) and (cn_unit.image == None)
]

for cn_unit in controlnet_units:
    print(cn_unit.module) # "softedge_pidinet"
    preprocessor = controlnet_preprocessors[cn_unit.module]

    #EXCEPTION: KeyError: 'softedge_pidinet'
djbielejeski commented 1 year ago

Can no longer load a pre-processor like so from another extension

controlnet_external_code = importlib.import_module('extensions.sd-webui-controlnet.scripts.external_code', 'external_code')
controlnet_global_state = importlib.import_module('extensions.sd-webui-controlnet.scripts.global_state', 'global_state')
controlnet_preprocessors = controlnet_global_state.cn_preprocessor_modules

controlnet_units = [
    cn_unit for cn_unit in controlnet_external_code.get_all_units_in_processing(p)
    if (cn_unit.enabled == True) and (cn_unit.image == None)
]

for cn_unit in controlnet_units:
    print(cn_unit.module) # "softedge_pidinet"
    preprocessor = controlnet_preprocessors[cn_unit.module]

    #EXCEPTION: KeyError: 'softedge_pidinet'

IMO you should break extensions that hardcoded the module name, not extensions that dynamically (and correctly) get the name from your own ControlNetUnit class.

djbielejeski commented 1 year ago

aka this needs to be deleted, refactored, or made backwards compatible (add to global_state.py?)

controlnet.py line 159

    def get_module_basename(self, module):
        for k, v in self.preprocessor_keys.items():
            if v == module or k == module:
                return k