apple / ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
MIT License
16.51k stars 890 forks source link

Inpainting affects non-transparent parts of the image #211

Open SaladDays831 opened 1 year ago

SaladDays831 commented 1 year ago

Hi! :) I'm testing the new inpainting functionality that has recently been pushed to the main branch.

I'm using the Stable Diffusion 1.5 model converted with this command:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-safety-checker --model-version "runwayml/stable-diffusion-v1-5" --unet-support-controlnet --quantize-nbits 6 --attention-implementation SPLIT_EINSUM_V2 --convert-controlnet "lllyasviel/sd-controlnet-canny" --bundle-resources-for-swift-cli -o "/path/to/save"

and the already converted InPaint-SE model from here.

I'm also using macOS Preview to erase all the image content except my face to transparent, like so:

The resulting image kinda uses my face, but messes it up, while I was expecting the face to remain unchanged.

This is happening on iPadOS using the main branch of this package, and also on the latest version of MochiDiffusion.

I don't think that it's intended. In Automatic1111, when using InPaint + Canny I get good results where the face remains unchanged .

ynagatomo commented 1 year ago

That's strange. In my experiment, it worked fine.

Screenshot 2023-07-17 at 18 33 58
SaladDays831 commented 1 year ago

Hmm, thanks @ynagatomo, will try to convert the same inpaint model you use myself

SaladDays831 commented 1 year ago

No changes with the newly converted model Tested with just inpainting the face (instead of everything but the face) - it works ok-ish. I still get some noise/corruption outside the inpainted area (your example also has some minor color changes). Maybe it's not that visible in your example because it's not a photo but a painting? 🤔

When using the Automatic1111 WebUI and inpainting everything except the face (like in my example) - the face remains unchanged. In cases like these, even the slightest deformation on the person's face will result in a total mess :(

ynagatomo commented 1 year ago

at least, the masking feature for InPainting added by the PR is working. We may need to adjust the parameters and models. :)

jrittvo commented 1 year ago

I think the process may be sensitive to the base model being used, for some reason. When I use a given base model to generate the input image, and then that same base model (and the same seed when possible) for the ControlNet inpaint run, I get many fewer anomalies. I don't understand why that could be, but it seems to be that way for me.

atiorh commented 1 year ago

Hey @SaladDays831! I checked out A1111's in-painting UI after seeing this issue. There are a lot of additional knobs that are built around the core in-painting functionality in order to make it work better for certain use cases. Some examples for these knobs are:

SaladDays831 commented 1 year ago

Hi @atiorh :) Thanks for looking into this!

I didn't thoroughly test the difference, but there are two ways to do inpainting in A1111. All the settings you mentioned are present in the img2img -> inpaint tab (and you don't need a CN model for that from what I see)

For my tests, I just used the imported inpainting model in the ControlNet section of the txt2img tab, which looks like the "core" inpainting functionality. It doesn't have all these fancy settings + I can test the same model version I try to use with this package, and it works as expected (doesn't change the un-inpainted parts at all)

TimYao18 commented 11 months ago

Hi, I tried to add a Starting Image in Inpaint with SD1.5_cn, but it seems to have no effect and does not influence the resulting output image. I'm not sure if this is the correct behavior.

jrittvo commented 11 months ago

What commands or app are you using. You need to provide some details here before anyone can begin to help. Does your starting image have an area that is transparent to indicate what area is to be inpainted?

TimYao18 commented 11 months ago

I use swift diffusers and MochiDiffusion both. I just tried the Swift CLI and the starting image has no effect to the result, too.

Perhaps I didn't make myself clear. What I meant is that the results remain the same whether I include the Starting Image or not.

The images are as below. starting image masked image as controlnet input

jrittvo commented 11 months ago

At the moment the InPaint ControlNet is broken in Mochi Diffusion. At least half the time, it is ignoring the masked input image. I have a build that appears to fix the problem, but I don't know if my builds can run on other people's machines because it is not an Apple notarized app. If you would like to try it, this is the download link: https://huggingface.co/jrrjrr/Playground/blob/main/Mochi%20Diffusion%20(macOS%2013).dmg

When you say "Starting Image", does that mean that you trying to use 2 images? A masked image to define the inpaint area and a second image that you want to have fill the masked area? Can you explain a little more how you are setting it all up in either of your two methods?

jrittvo commented 11 months ago

Swift CLI, for ControlNet InPaint, it only uses the --controlnet-inputs. You can't also use the --image argument. The --image argument is for Image2Image.

This is the command I use (with my paths) for ControlNet:

swift run StableDiffusionSample "a photo of a cat" --seed 12 --guidance-scale 8.0 --step-count 24 --image-count 1 --scheduler dpmpp --compute-units cpuAndGPU --resource-path ../models/sd-5x7 --controlnet InPaint-5x7 --controlnet-inputs ../input/cat-5x7.png --output-path ../images

TimYao18 commented 11 months ago

I set 2 images as the MochiDiffusion screenshot here

The starting image is defined in the PipelineConfiguration: /// Starting image for image2image or in-painting public var startingImage: CGImage? = nil

I don't know if inpaint need the starting image, and I think inpaint might reference to the starting image to fix something.

I apologize for causing some confusion.

jrittvo commented 11 months ago

ControlNet InPaint in Mochi only uses one input image. The masked image. The text prompt tells what to put in the masked area. The upper spot for an input image in Mochi only gets used with Image2Image. It has no effect on ControlNet.

This is with my test build. Remember, the build that downloads from the Mochi GitHub is presently broken for most ControlNets.

Screencap

jrittvo commented 11 months ago

And yes, this is all very confusing because it is not explained well with visual examples anywhere. That is something Mochi needs to improve on.

jrittvo commented 11 months ago

In this example, everything is masked except the face. The text prompt tells to use a "suit of armor" where there is mask.

jrittvo commented 11 months ago

Masked image mask-blouse-5x5

Prompt: Woman in flower print blouse Woman with flower print blouse 10 1371478925 Woman with flower print blouse 12 1371478927

jrittvo commented 11 months ago

When I have used it in Swift CLI, it is the same inputs and logic. The python CLI pipeline may be different.