[Bug]: controlnet/img2img api returns image with unexpected size

yadreny commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

I used controlnet/img2img api to inpaint character into the background, using depth model. it works, but the result has unexpected size (while WebUI form works right way). It's downscaled to 1024x512 from 1280x760, and since the sizes differ, the contents of the image also differ.

Steps to reproduce the problem

Just add the mask in the request (in root of arguments, not in controlnet_unit) Also, to make the request work, I had to invert the mask (which I used in inpaint upload). If there is no mask in the request it works correct.

What should have happened?

It would be nice if the API will return image with original size.

Commit where the problem happens

webui: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8 controlnet: 48fce60f24c5812048b6359e4739ba8c6aa63073

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

set COMMANDLINE_ARGS=
call webui.bat --api

Console logs

venv "D:\SD\new sd\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Commit hash: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
Installing requirements for Web UI

Launching Web UI with arguments: --api
No module 'xformers'. Proceeding without it.
Loading weights [0aecbcfa2c] from D:\SD\new sd\stable-diffusion-webui\models\Stable-diffusion\dreamlike-diffusion-1.0.ckpt
Creating model from config: D:\SD\new sd\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 20.9s (load weights from disk: 17.2s, create model: 0.5s, apply weights to model: 0.5s, apply half(): 0.8s, move model to device: 0.8s, load textual inversion embeddings: 1.0s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading model: control_sd15_depth [fef5e48e]
Loaded state_dict from [D:\SD\new sd\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_sd15_depth.pth]
ControlNet model control_sd15_depth [fef5e48e] loaded.
Loading preprocessor: depth
100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [01:11<00:00,  1.56s/it]
Loading model: control_sd15_depth [fef5e48e]███████████████████████████████████████████| 46/46 [01:10<00:00,  1.53s/it]
Loaded state_dict from [D:\SD\new sd\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_sd15_depth.pth]
ControlNet model control_sd15_depth [fef5e48e] loaded.
Loading preprocessor: depth
100%|██████████████████████████████████████████████████████████████████████████████████| 46/46 [03:03<00:00,  3.99s/it]
Total progress: 92it [13:57,  3.99s/it]

Additional information

Here is my request: `{ "include_init_images":true, "init_images":["img"], "mask":"img", "inpainting_mask_invert":0, "inpainting_fill":0, "inpaint_full_res":true,

"controlnet_units":
[{
    "input_image":"img",
    "module":"depth",
    "model":"control_sd15_depth[fef5e48e]",
    "resize_mode":"Scale to Fit (Inner Fit)",
    "lowvram":true,
    "guessmode":false,
    "weight":1,
    "processor_res":384,
    "threshold_a":64,
    "threshold_b":64,
    "guidance_start":0,
    "guidance_end":1
}],

"override_settings_restore_afterwards":false,
"denoising_strength":0,8,
"resize_mode":1,
"mask_blur":5,
"inpaint_full_res_padding":1,
"width":1200,
"height":400,
"batch_size":1,
"n_iter":1,
"steps":60,
"cfg_scale":10,
"prompt":"character",
"negative_prompt": "",
"style":[],
"seed": 7777,
"subseed":-1,
"subseed_strength":0,
"seed_resize_from_h":-1,
"seed_resize_from_w":-1,
"restore_faces":false,
"sampler_index":"Euler a",
"batch_count":1

}`

ljleb commented 1 year ago

At the moment, the current behavior of using root properties is not to merge them with the first control unit specified under controlnet_units, but instead to prepend a new unit to the list. So the root controlnet unit becomes a new controlnet unit at index 0 in the list. It would be confusing to merge different config values: which value takes precendence over the other? do we prepend only when there are no property collisions? (the behavior could become quite more confusing in this case: just adding a property could add or remove a controlnet unit).

The root values are deprecated, you should prefer to only use the controlnet_units property if you can. Is there a reason you want to use the deprecated and new paradigms simultaneously?

ljleb commented 1 year ago

Nevermind, I thought by mask you meant passing a mask to the control_net_mask root property. The mask root property is just forwarded to the img2img route of the webui, I'm a bit confused why it would act differently than the webui's route. Can you verify that the same behavior does not also occur with "controlnet_units": []? what about the same request in the /sdapi/v1/img2img route instead?

yadreny commented 1 year ago

You are right /sdapi/v1/img2img behaves exactly the same. if there is a mask in the request, the image is also downscaled, ignoring the specified width and height. Although a similar action through the WebUI form returns an image with the correct size.

ljleb commented 1 year ago

That's what I suspected. Check if there's an issue for this already in the webui repo. If not, imo you should consider opening an issue there.

I'm not sure if it can be fixed in this repo, but even if it could ideally it should be fixed in the webui.

yadreny commented 1 year ago

Sure. Already busy with it. Thank you for the prompt response and the wonderful extension.

joseph-schwartz commented 1 year ago

I actually have gotten this behavior with a 600x450 image returning a 600x448 image. Doesn't sound like a big deal unless you are checking for their sizes to be the same when using bitmaps

yiouyou commented 1 year ago

May I ask how to get the api work? After reading the fastAPI doc of the webui, I use '/controlnet/img2img', however, no matter what I do, I always get 500 error code. I assume the "img" in the upper example is the string of b64_img. I'm not sure what I missed. The payload is as below:

payload = {
        "sampler_name": "Euler a",
        "sampler_index": "Euler a",
        "steps": 20,
        "n_iter": 1,
        "batch_size": 1,
        "cfg_scale": 7,
        "include_init_images": True,
        "denoising_strength": 0.75,
        "styles": [],
        "init_images": [b64_img("E:/_Ai/_workflow/tmp/untitled/2D-02.jpg")],
        "prompt": "a cartoon character is playing with a ball in the desert with a ladybug on his head and a ladybug on his leg, David Firth, promotional image, a character portrait, pop surrealism",
        "negative_prompt": "",
        "width": 512,
        "height": 512,
        "seed": -1,
        "restore_faces": False,
        "override_settings": {},
        "tiling": False,
        "mask": "",
        "mask_blur": 4,
        "inpainting_fill": 0,
        "inpaint_full_res": True,
        "inpaint_full_res_padding": 0,
        "inpainting_mask_invert": 0,
        "initial_noise_multiplier": 0,
        "subseed": -1,
        "subseed_strength": 0,
        "seed_resize_from_h": -1,
        "seed_resize_from_w": -1,
        "s_churn": 0,
        "s_tmax": 0,
        "s_tmin": 0,
        "s_noise": 1,
        "eta": 0,
        "resize_mode": 0,
        "image_cfg_scale": 0,
        "override_settings_restore_afterwards": True,
        "controlnet_units": [
            {
                "input_image": b64_img("E:/_Ai/_workflow/tmp/untitled/2D-02.jpg"),
                "module": 'canny',
                "model": 'control_canny-fp16[[fef5e48e]]',
                "weight": 1,
                "lowvram": True,
                "guessmode": False,
                "resize_mode": "Scale to Fit (Inner Fit)",
                "guidance": 1,
                "guidance_start": 0,
                "guidance_end": 1,
                "mask": "",
                "processor_res": 64,
                "threshold_a": 64,
                "threshold_b": 64,
            }
        ]
    }

Any help is mush appreciated. Thanks!!

I figured out that it needs to set "Allow other script to control this extension". Aftering doing this, I get the generated img, BUT it's not same as did with GUI even keep the same seed. It seems the model is not used, since no load info showed up in webui console.

Any idea why and how to fix this?

Thanks for your attention!

joseph-schwartz commented 1 year ago

I had the same issue. Run this in the API Swagger docks for ControlNetImage2Image

{
  "init_images": ["You'reBase64StringHere"],
  "cfg_scale": 7,
  "prompt": "You'rePromptHere",
  "width": 512,
  "height": 512,
  "controlnet_units": [
    {
      "input_image": "You'reBase64StringHere",
      "module": "You'rePreProcessorModuleNameHere",
      "model": "You'reModelNameHere"
    }
  ]
}

Then look at the return statement:

 "parameters": {
    "init_images": null,
    "resize_mode": 0,
    "denoising_strength": 0.75,
    "image_cfg_scale": null,
    "mask": null,
    "mask_blur": 4,
    "inpainting_fill": 0,
    "inpaint_full_res": true,
    "inpaint_full_res_padding": 0,
    "inpainting_mask_invert": 0,
    "initial_noise_multiplier": null,
    "prompt": "You'rePromptStringHere",
    "styles": null,
    "seed": -1,
    "subseed": -1,
    "subseed_strength": 0,
    "seed_resize_from_h": -1,
    "seed_resize_from_w": -1,
    "sampler_name": null,
    "batch_size": 1,
    "n_iter": 1,
    "steps": 50,
    "cfg_scale": 7,
    "width": 512,
    "height": 512,
    "restore_faces": false,
    "tiling": false,
    "negative_prompt": null,
    "eta": null,
    "s_churn": 0,
    "s_tmax": null,
    "s_tmin": 0,
    "s_noise": 1,
    "override_settings": null,
    "override_settings_restore_afterwards": true,
    "script_args": [],
    "sampler_index": "Euler",
    "include_init_images": false,
    "script_name": null
  },

These are the true defaults to use, not what you have as defaults

Mikubill / sd-webui-controlnet