Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
17.1k stars 1.96k forks source link

[Bug]: Inpaint via API txt2img not working #2310

Closed Xijamk closed 11 months ago

Xijamk commented 11 months ago

Is there an existing issue for this?

What happened?

I am attempting to use txt2img and controlnet with an image and a mask, but I'm encountering issues where the mask seems ineffective. This is a shift from my previous workflow, where I used im2img without controlnet for inpainting. Now, my goal is to utilize txt2img with controlnet for the same purpose. Despite reviewing both resolved and open issues, and examining the payload through an extension, I am unable to diagnose the problem.

PLEASE HELP, I've been stuck on this for the last 2 months now

In the example below, I'm trying to close her eyes via API without success, I've already tried a thousand things;

Payload used in this example: {
"prompt": "(grainy cinematic photography:1.2) shot on Bessa R2A Cinestill photo of a beautiful young italian woman, wavy hair, cute smile, closed eyes", "negative_prompt": "", "sampler_name": "DPM++ 2M Karras", "batch_size": 1, "steps": 20, "cfg_scale": 7, "width": 512, "height": 768, "alwayson_scripts": { "controlnet": { "args": [ { "model": "control_v11p_sd15_inpaint [ebff9138]", "module": "inpaint_only+lama", "resize_mode": 1, "control_mode": 0, "image": {"image": base64string, "mask": base64mask} } ] } } }

Image: original

Mask: inpaint_mask

Results via UI: a1111_inpainted_txt2imgg

Results via API is the exact same original image. download

Mask returned from the API (second image from the response) download

Steps to reproduce the problem

  1. Go to http://127.0.0.1:7860/docs#/default/text2imgapi_sdapi_v1_txt2img_post
  2. Use the payload provided
  3. Help

What should have happened?

The response from the API being the same as the UI.

Commit where the problem happens

webui: txt2img controlnet: inpaint

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--opt-sdp-attention --api --share --enable-insecure-extension-access

List of enabled extensions

Controlnet, Dynamic Prompts, Additional Networks, Adetailer, AnimateDiff

Console logs

venv "F:\AI\AUTOMATIC1115\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.6.1
Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7
Launching Web UI with arguments: --opt-sdp-attention --api --share --enable-insecure-extension-access
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
[-] ADetailer initialized. version: 23.11.0, num models: 11
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
2023-12-08 17:46:46,003 - ControlNet - INFO - ControlNet v1.1.417
ControlNet preprocessor location: F:\AI\AUTOMATIC1115\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads
2023-12-08 17:46:46,112 - ControlNet - INFO - ControlNet v1.1.417
Loading weights [ce4629b477] from F:\AI\AUTOMATIC1115\stable-diffusion-webui\models\Stable-diffusion\Real_Jugg.safetensors
2023-12-08 17:46:47,272 - AnimateDiff - INFO - Injecting LCM to UI.
2023-12-08 17:46:47,569 - AnimateDiff - INFO - Hacking i2i-batch.
Creating model from config: F:\AI\AUTOMATIC1115\stable-diffusion-webui\models\Stable-diffusion\Real_Jugg.yaml
Applying attention optimization: sdp... done.
Model loaded in 6.8s (load weights from disk: 0.6s, create model: 0.7s, apply weights to model: 3.3s, apply half(): 1.1s, calculate empty prompt: 1.0s).
F:\AI\AUTOMATIC1115\stable-diffusion-webui\extensions\sd-webui-additional-networks\scripts\metadata_editor.py:399: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
F:\AI\AUTOMATIC1115\stable-diffusion-webui\extensions\sd-webui-additional-networks\scripts\metadata_editor.py:521: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  cover_image = gr.Image(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://524f7e3577b2b61905.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Startup time: 27.0s (prepare environment: 2.7s, import torch: 2.8s, import gradio: 0.7s, setup paths: 0.6s, initialize shared: 0.2s, other imports: 0.5s, list SD models: 0.2s, load scripts: 3.6s, create ui: 7.2s, gradio launch: 8.4s).
2023-12-08 17:48:27,418 - ControlNet - INFO - Loading model: control_v11p_sd15_inpaint [ebff9138]
2023-12-08 17:48:27,864 - ControlNet - INFO - Loaded state_dict from [F:\AI\AUTOMATIC1115\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11p_sd15_inpaint.pth]
2023-12-08 17:48:27,864 - ControlNet - INFO - controlnet_default_config
2023-12-08 17:48:30,114 - ControlNet - INFO - ControlNet model control_v11p_sd15_inpaint [ebff9138] loaded.
2023-12-08 17:48:30,212 - ControlNet - INFO - using inpaint as input
2023-12-08 17:48:30,212 - ControlNet - INFO - Loading preprocessor: inpaint_only+lama
2023-12-08 17:48:30,212 - ControlNet - INFO - preprocessor resolution = -1
2023-12-08 17:48:31,562 - ControlNet - INFO - ControlNet used torch.float16 VAE to encode torch.Size([1, 4, 96, 64]).
2023-12-08 17:48:31,579 - ControlNet - INFO - ControlNet Hooked - Time = 4.364281177520752
2023-12-08 17:48:31,629 - ControlNet - INFO - [ControlNet] Initial noise hack applied to torch.Size([1, 4, 96, 64]).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  3.09it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00,  3.38it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00,  4.27it/s]

Additional information

No response

AIGC404 commented 11 months ago

I have encountered the same issue. When using the API to generate results with a fixed seed, the output differs from what is obtained through the web UI. The API's mask parameter was obtained by capturing network traffic using the browser's F12 developer tools, so I am certain that all the parameters are guaranteed to be 100% identical.

I have noticed that the results of the Controlnet preprocessor obtained through the API are inconsistent with those obtained through the web UI. This issue is likely the cause of the discrepancy.

Here are my parameters: { "pixel_perfect": false, "control_mode": 0, "module": "inpaint_only", "image": { "image": "...Image base64", "mask": "...Mask base64" }, "weight": 1, "model": "control_v11p_sd15_inpaint [ebff9138]", "enabled": false }

huchenlei commented 11 months ago

I have encountered the same issue. When using the API to generate results with a fixed seed, the output differs from what is obtained through the web UI. The API's mask parameter was obtained by capturing network traffic using the browser's F12 developer tools, so I am certain that all the parameters are guaranteed to be 100% identical.

I have noticed that the results of the Controlnet preprocessor obtained through the API are inconsistent with those obtained through the web UI. This issue is likely the cause of the discrepancy.

Here are my parameters: { "pixel_perfect": false, "control_mode": 0, "module": "inpaint_only", "image": { "image": "...Image base64", "mask": "...Mask base64" }, "weight": 1, "model": "control_v11p_sd15_inpaint [ebff9138]", "enabled": false }

@AIGC404 You have to set enabled: true in your payload.

huchenlei commented 11 months ago

@Xijamk We have verified that API can completely reproduce the result from A1111. You can try use https://github.com/huchenlei/sd-webui-api-payload-display to dump out corresponding API payload from your A1111 runs.

AIGC404 commented 11 months ago

Sorry, there seems to be an issue with the API demo parameters I provided earlier. However, even after changing the "enabled" parameter to true, I'm still experiencing the same problem.

I'm certain that there is an issue, so I kindly request you, expert, to test and verify it. Additionally, you mentioned using "sd-webui-api-payload-display" to retrieve the API parameters, but the returned values include some that cannot be directly used and have extra parameters.

Here's a screenshot of the parameters for ControlNet: image

And here's a screenshot of the webUI configuration: image

Here is the file with the parameters. api-demo.zip

AIGC404 commented 11 months ago

@huchenlei help me

Xijamk commented 11 months ago

@Xijamk We have verified that API can completely reproduce the result from A1111. You can try use https://github.com/huchenlei/sd-webui-api-payload-display to dump out corresponding API payload from your A1111 runs.

@huchenlei

If I try to copy-paste the api using that extension, I receive this error: image

If I delete that, I keep receiving the same error about other fields (batch_images, input_mode, loopback, output_dir), and if I remove all of that, the API call completes, but the returned image is a random girl with closed eyes.

Am I doing something wrong?, I've A1111 in 1.6.1 version and ControlNet in 1.1.417

Xijamk commented 11 months ago

@huchenlei, there seems to be an issue. I just tried the new examples provided at https://github.com/Mikubill/sd-webui-controlnet/pull/2317/files, and I am encountering the same problem I described in this thread. The system is returning the original image without any changes.

huchenlei commented 11 months ago

@Xijamk Please update your ControlNet to latest version (1.1.422). The latest version will ignore unrecognized params.

Xijamk commented 11 months ago

@huchenlei, I've updated and am no longer receiving the unrecognized parameters errors. However, I'm still facing the same issue with inpainting; nothing changes. I've attached the exact payload that I'm using. Could you try it yourself to rule out environment-related problems? Currently, I have only the ControlNet and payload extensions enabled. PayloadExample.txt

Xijamk commented 11 months ago

@AIGC404 have you had any luck?

huchenlei commented 11 months ago

@Xijamk I think it you have some problems with your mask image. Everything in the mask image has to be either rgb(0, 0, 0) or rbg(255, 255, 255).

I used the input image/input mask in my example with your payload and there is no problem doing the inpaint.

Xijamk commented 11 months ago

I don't think so. I've tried using your images as well, but with no luck. Here is the payload featuring your images, and yet, there's still no change. It seems there might be something conflicting with the inpainting process via the API. PayloadExampleHuchenleiImages.txt

huchenlei commented 11 months ago

You can also try to update your A1111 to latest version. I am testing under A1111 (f92d6149).

Xijamk commented 11 months ago

I've updated A1111 to the latest version and the problem persist, the mask is not being taken into account via API.

Xijamk commented 11 months ago

@huchenlei I've done a fresh install of A1111 and ControlNet on a new PC, but it's not working there either.

In an effort to shed some light on the problem, I've made a video demonstrating the problem step by step:

Video: https://drive.google.com/file/d/1zFQGgEgFVg2b3VqAQubauTDe2MxKjFg4/view Payload: PayloadExampleHuchenleiImages.txt

huchenlei commented 11 months ago

@Xijamk I think your problem is using base64guru.

import cv2
import base64

guru_base64 = "iVBORw0KGgoAAAANSUhEUgAAAgAAAAMAAQMAAABowU0NAAAAAXNSR0IB2cksfwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAZQTFRFAAAA////pdmf3QAAAIdJREFUeJztzDENAAAIA7D5Nw0idpCQVkATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAN6YkkAgEAgEAoFAIBAIBAKBQCAQCAQCgUAgEAgEAoFAIBAIBAKBQCAQCAQCgUAgEAgEAoFAIBAIBAKBQCAQCAQCgUAgEAgEAoFAIBAIBAKBQCA4CRYO9apHKdWG5QAAAABJRU5ErkJggg=="

def read_image(img_path: str) -> str:
  img = cv2.imread(img_path)
  _, bytes = cv2.imencode(".png", img)
  encoded_image = base64.b64encode(bytes).decode("utf-8")
  return encoded_image

mask_image = read_image("mask.png")

print(len(guru_base64))
print(len(mask_image))

I compared the output of guru base64 result and base64 encoding from example code. Here is the console output:

> python .\compare.py
328
3980
AIGC404 commented 11 months ago

@huchenlei The base64 you submitted was obtained through a CV method, but traditional base64 encoding is usually generated using PIL. I believe you should also support the base64 encoding generated by PIL.

from io import BytesIO
from PIL import Image

import cv2
import base64

# Read an image using OpenCV and convert it to base64.
def cv_read_image(img_path: str) -> str:
    img = cv2.imread(img_path)
    _, bytes = cv2.imencode(".png", img)
    encoded_image = base64.b64encode(bytes).decode("utf-8")
    return encoded_image

# Read an image in base64 using PIL
def pil_read_image(img_path):
    # 将图像转换为Base64编码
    image_buffer = BytesIO()
    Image.open(img_path).save(image_buffer, format='PNG')
    return base64.b64encode(image_buffer.getvalue()).decode('utf-8')

cv_mask_image = cv_read_image("mask.png")

pil_mask_image = pil_read_image("mask.png")

print(len(cv_mask_image))
print(len(pil_mask_image))
AIGC404 commented 11 months ago

@Xijamk No, I haven't had much luck

I ran the example script api_inpaint.py located in example/inpaint_example, but the returned result is still problematic. The preprocessed image returned is the same as the original image. The version of the web UI I used is v1.6.0-2-g4afaaf8a, and the controlnet version is 1.1.422.

txt2img inpaint result: txt2img-0 txt2img-1

img2img inpaint result: img2img-0 img2img-1

AIGC404 commented 11 months ago

I tried switching multiple versions of the web UI, but none of them had any effect. I have been very unlucky.

DrCyanide commented 11 months ago

I'm also having an issue with the mask not being applied via the API.

I've tried both {"input_image": b64str, "mask":b64str} and {"image": {"image": b64str, "mask": b64str}}. In both tests the results images[0] and images[1] are identical.

Here's my JSON (without the encoded image, because that's too big), sent to Txt2Img.

{
    "width": 1024,
    "height": 768,
    "prompt": "Baseball",
    "negative_prompt": "",
    "batch_size": 1,
    "cfg_scale": 9,
    "seed": -1,
    "subseed": -1,
    "subseed_strength": 0,
    "enable_hr": false,
    "alwayson_scripts": {
        "controlnet": {
            "args": [
                {
                    "module": "inpaint_only+lama",
                    "model": "control_v11p_sd15_inpaint [ebff9138]",
                    "weight": 1,
                    "resize_mode": 2,
                    "lowvram": true,
                    "processor_res": 512,
                    "threshold_a": -2,
                    "threshold_b": -3,
                    "guidance_start": 0,
                    "guidance_end": 1,
                    "control_mode": 2,
                    "pixel_perfect": true,
                    "image": {
                        "image": "Replace with attached image",
                        "mask": "iVBORw0KGgoAAAANSUhEUgAABAAAAAMAAQMAAACAdIdOAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADI0lEQVR4nO3dQW6jMBQGYKouuuwROApHg6NxlByhyyyiZKTRTGokT2cF/1P4vDWSP8l+zwYje3jsXNbhu4yd+mFvwFIJ8HlKwABQCfARANwBANKAGwAAAABAKUBiRQQAAAAAAAAAAAAAAAAAAAAAAAAAAFALEP9YDXBKQHzzGgAAAAAAAAAAAAAAAAAAoBbgHSAAeAAAAAAAtIA3AIAEYAEAAAAAAAAAAACoBOg1BvD6gBUAAAAAAAAAAAAAAACgBcwAAcAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIA1YAQBaQKceAAAAAAAAAOD1AQsAQNP+GwAAAAAAAABAAjAAAFQCvAMABAB3AIBSgA8AAAAAAAAAAACAMwJuAAAt4BMAAAAAAAAAAAAAAAAAAAAgALgCALSAEQDgjIAvAIBSgAkAAAAAAAAAAAAgALgAAJQCzAAAAcAKANACeg8AAAAAAAAAAAAAAADsDVgAKgF6R1IBAAAAAAAAAAAAAAAA7A4YACoBeseUAgAAAAAAAAAA7A3478nxLw+IdwEAQBwQD8M4IN4FAABxQDwM44B4FwAAxAHxMASIjwEAgHgUAMTHAAAAQC1AfPs+8nf9BjCnAdPpAWMAsLSAxGG5G0DiSqlagMR1wxtA4ub3WoBeY4fmgTxgPhxwLwaYDgfcigHG8wGuxQCd+fjVAV/FAJ0FAcDOgAvAFtBZFAIAAAAAAAC8OiC+JAOIA67FACd8OY0D4l9I4oB7McB0OOBRDDAfD1hqAToP7A1YSwESWzaXNGCzIEhs220AiY3LaxoQv+13MxmMacAUAGwmgzkBaHNxr353wJoGNKkw8iNTm4l6aWB/QJMIemlgf0ATh2ME0MThlAF8h8GcATzDoBsEBwCe82E3CA4APEdhNwgOADyT8ZQC/B0E/doDALefhsARgD99MOYAt38H4TGA36E4RQE/FAAAAAAAAAAAAIBfXOtmb1ajPP0AAAAASUVORK5CYII="
                    }
                }
            ]
        }
    },
    "override_settings": {
        "sd_model_checkpoint": "Mao's_mix_anime_V1.ckpt [5228b68555]",
        "sd_vae": "vae-ft-mse-840000-ema-pruned.vae.pt"
    },
    "sampler_name": "DPM++ 2M SDE Karras",
    "steps": 50,
    "override_settings_restore_afterwards": false,
    "n_iter": 1
}

And here's my test image image_in

I know that ControlNet is getting the image, because Txt2Img would generate an entirely new "baseball" image if ControlNet was doing nothing.

huchenlei commented 11 months ago

@DrCyanide Do you have any success running https://github.com/Mikubill/sd-webui-controlnet/blob/main/example/inpaint_example/api_inpaint.py? What are the results look like?

DrCyanide commented 11 months ago

@DrCyanide Do you have any success running https://github.com/Mikubill/sd-webui-controlnet/blob/main/example/inpaint_example/api_inpaint.py? What are the results look like?

Those results look correct. I'm going to have to see if I can spot the differences and find out why my JSON fails.

DrCyanide commented 11 months ago

Looks like the difference is in the Base64 encoding for Monochrome image vs Color image. The Base64 mask in my JSON above was generated from a Monochromatic image (converting the transparency of a layer in Krita to black and the rest to white). Writing the Base64 mask out to a PNG seems to add the RGB values back, which can then be read in by the example script, causing it to act as expected.

I'll have to dig around and see if I can find a way to convert the Monochromatic mask back to Color before sending it to ControlNet.

I haven't updated my ControlNet for the test (still using v1.1.415), so if newer versions are more forgiving on this then that might be a good reason to update.

wixxxez commented 2 months ago

I leave a comment here. I hope it will be helpful for someone in the future.

So I faced a similar issue but with ComfyUI. Masking in UI works perfectly, but via API it returns the original image.

The problem is hiding in the way how you set the mask in the image before sending it to API. I found in the original ComfyUI repo the way they are extracting masks from Image. Knowing that I can set the mask properly.

I leave the code here.


 white_background = Image.new("RGBA", original_image.size, (255, 255, 255, 255))

    # Create a mask from the canvas
    mask_image = Image.fromarray(mask_canvas.image_data.astype(np.uint8))

    # Convert images to tensors
    original_tensor = torch.from_numpy(np.array(original_image)).permute(2, 0, 1)  # Convert to CxHxW
    mask_tensor = torch.from_numpy(np.array(mask_image)).permute(2, 0, 1)

    # Create the output tensor
    output_tensor = original_tensor.clone()

    # Set alpha channel based on the mask
    red_channel = mask_tensor[0]  # Get the red channel from the mask
    output_tensor[3] = torch.where(red_channel == 255, torch.tensor(0), original_tensor[3])  # Set alpha to 0 where the mask is red

    # Convert output tensor back to an image
    output_image = Image.fromarray(output_tensor.permute(1, 2, 0).byte().numpy(), mode='RGBA')

    st.image(output_image, caption='Image with Selected Area Black', use_column_width=True)