[Feature Request]: Marigold depth - high-quality diffusion-based monocular depth estimation

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

GNU Affero General Public License v3.0

135.68k stars 25.9k forks source link

[Feature Request]: Marigold depth - high-quality diffusion-based monocular depth estimation #14274

Open toshas opened 7 months ago

toshas commented 7 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Improved monocular depth estimation algorithm based on fine-tuned stable diffusion. Expecting better guidance for depth-to-image and other models employing monocular depth.

Proposed workflow

As an alternative to MiDaS

Additional information

https://marigoldmonodepth.github.io/ https://twitter.com/AntonObukhov1/status/1732946419663667464

w-e-w commented 7 months ago

maybe I'm missing something but I don't recall any part of webui using depth estimation maybe you should submit a request to control net https://github.com/Mikubill/sd-webui-controlnet ?

toshas commented 7 months ago

Thanks for the pointer @w-e-w , however, are the following code bits not used within this repo? https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_models.py#L372 https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_models.py#L413-L453

w-e-w commented 7 months ago

OH !!! my mind is completely blown now because somehow I never noticed that there is depth2image https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#depth-guided-model

MisterSeajay commented 6 months ago

Not sure if this is relevant here but from https://github.com/lllyasviel/ControlNet?tab=readme-ov-file#controlnet-with-depth:

Note that different from Stability's model, the ControlNet receive the full 512×512 depth map, rather than 64×64 depth. Note that Stability's SD2 depth model use 64*64 depth maps. This means that the ControlNet will preserve more details in the depth map.