AnonymousCervine / depth-image-io-for-SDWebui

An extension to allow managing custom depth inputs to Stable Diffusion depth2img models for the stable-diffusion-webui repo.
72 stars 6 forks source link

Depth-map reduced to white in output #5

Closed YanWittmann closed 1 year ago

YanWittmann commented 1 year ago

I seem to have some trouble getting this extension to work;

I use the 512-depth-ema.ckpt with an according 512-depth-ema.yaml file, which works in img2img. When I ask the UI to generate an image on the text2img tab with the extension script active, the depth map is reduced to almost white however:

grafik

And the whole UI:

grafik

The error might very well be on my part, can someone tell me where I went wrong? Thank you!

Details:

AnonymousCervine commented 1 year ago

There have been reported issues with different image formats already (namely, issue #2 ); by any chance can you give me a copy of the file used for the input depth image here?

(It almost certainly isn't going to be a problem with the configuration in this case. Based on that screenshot, the extension isn't parsing the input depth image correctly.)

YanWittmann commented 1 year ago

Ah, the image format. All right, I got it to work

Here's the original PNG:

PNG before

And here's the re-exported PNG version:

PNG after

Here are some results for The face of a young man, highly detailed photography

grafik

I am having some trouble adding a red shirt however, do you have any suggestions on how to not make the entire face red (which is the case for almost every generation I make with this prompt)? The face of a young man, highly detailed photography wearing a red shirt

grafik

Thank you!

AnonymousCervine commented 1 year ago

First, thank you for the source images, they're most helpful!

Since it does seem to be the same format problem, I'm going to mark this issue as a duplicate of #2 (any updates will be posted there)

I am having some trouble adding a red shirt however, do you have any suggestions on how to not make the entire face red

I too have noticed that the depth2img model is honestly a little bit wonky about prompts sometimes compared to normal SD models we're used to—perhaps especially with regard to colours.

Depending on your goals: You can always do it in two steps and use inpaint! There is a bug where SD doesn't handle inpainting correctly for depth2img but if you supply the depth manually with this extension (i.e. the same depth image, again) it will respect it. (Though it will still may have that certain "I am inpainted" look; on the other hand, it doesn't get the general structure of things wrong as much as normal inpainting because it has the depth to guide it.)

So for instance, using your first prompt and getting this (emulated stock-photo bands and all! And strikingly similar to the image you got, really. I do wonder if in general the structure provided by the depth half of depth2img model-input doesn't make SD a little less creative sometimes):

image

Masked like so:

image

With prompt "a young man wearing a (red:1.2) shirt" (emphasis only on the things inpainting needs to know, and you'll notice I had trouble getting it to pick up the idea of 'red' again in an image without it); cherry-picked best result out of 12-ish:

image

(Note that it's still following the shoulder contours suggested by the depth image. Otherwise, it's just a normal-ish inpainting job.)