Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
17.07k stars 1.96k forks source link

[Bug]: reference_adin, reference_adin+attn and reference_only not working correctly with API #1319

Closed ramyma closed 1 year ago

ramyma commented 1 year ago

Is there an existing issue for this?

What happened?

After 36f0ff5f5c3d7a970103cbc4e2b4419771f533d1 using reference_adin, reference_adin+attn or reference_only with control_mode set to Balanced doesn't work anymore.

With reference_adin1 and reference_adin+attn throws this exception:

NansException: A tensor with all NaNs was produced in Unet. This could be either
because there's not enough precision to represent the picture, or because your 
video card does not support half type. Try setting the "Upcast cross attention 
layer to float32" option in Settings > Stable Diffusion or using the --no-half 
commandline argument to fix this. Use --disable-nan-check commandline argument 
to disable this check.

While reference_only returns a scrambled image.

When threshold_a is set to 0 or 1 with control_mode set to Balanced an seemingly normal image is returned.

I tested with the same generation request code before and after 36f0ff5f5c3d7a970103cbc4e2b4419771f533d1 to confirm that it broke on this commit.

It's worth noting that this issue doesn't happen when using the webui interface.

Steps to reproduce the problem

Create an api request with a controlnet unit with model set to none and module set to reference_adin, reference_adin+attn or reference_only and control_mode set to Balanced and threshold_a set to 0.5.

Here's an example payload:

 ...
 controlnet: {
      args: [
        {
          input_image: "",
          mask: nil,
          model: "None",
          module: "reference_adain+attn",
          weight: 1.0,
          resize_mode: "Just Resize",
          lowvram: true,
          processor_res: 64,
          threshold_a: 64,
          threshold_b: 64,
          guidance: 1.0,
          guidance_start: 0.0,
          guidance_end: 1.0,
          pixel_perfect: true,
          control_mode: 0
        }
      ]
    }

What should have happened?

Image should be returned without an issue.

Commit where the problem happens

webui: v1.2.1 (but it seems to be working in webui) controlnet: 36f0ff5f5c3d7a970103cbc4e2b4419771f533d1

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--api --xformers --xformers-flash-attention --skip-version-check

List of enabled extensions

-

Console logs

/stable-diffusion-webui/modules/devices.py:156 in test_for_nans    │
│                                                                              │
│   155 │                                                                      │
│ ❱ 156 │   raise NansException(message)                                       │
│   157                                                                        │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │ message = "A tensor with all NaNs was produced in Unet. This could be    │ │
│ │           either because there'"+324                                     │ │
│ │  shared = <module 'modules.shared' from                                  │ │
│ │           '/stable-diffusion-webui/modules/shared.py'>         │ │
│ │   where = 'unet'                                                         │ │
│ │       x = tensor([[[[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]]],              │ │
│ │           │   │                                                          │ │
│ │           │   │                                                          │ │
│ │           │   │   [[[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]],               │ │
│ │           │   │                                                          │ │
│ │           │   │    [[nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     ...,                                                 │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan],                │ │
│ │           │   │     [nan, nan, nan,  ..., nan, nan, nan]]]],             │ │
│ │           device='cuda:0')                                               │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
NansException: A tensor with all NaNs was produced in Unet. This could be either
because there's not enough precision to represent the picture, or because your 
video card does not support half type. Try setting the "Upcast cross attention 
layer to float32" option in Settings > Stable Diffusion or using the --no-half 
commandline argument to fix this. Use --disable-nan-check commandline argument 
to disable this check.


### Additional information

_No response_
lllyasviel commented 1 year ago

set threshold_a to a number between 0 to 1 - it is the style fidelity - should fix this but will update a range check in future versions

lllyasviel commented 1 year ago

https://github.com/Mikubill/sd-webui-controlnet/commit/92c937a41ecaff1eee99bb9150e766138d976aa3 fixed 1.1.175

ramyma commented 1 year ago

@lllyasviel thanks for pointing that out! Turns out I had a bug with my implementation.

Don't you think it's better to return a 422 response?

While this defaults to a reasonable value, it would make it harder for someone to figure they sent a wrong value.