SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

data of depth ControlNet #40

Closed staymylove closed 6 months ago

staymylove commented 7 months ago

Dear author: I want to ask the format of data of depth ControlNet must be 6 subfigures? I think it is hard to get the multiview depth image.

eliphatfs commented 7 months ago

I think in general you will need to know the geometry beforehand. After that you can render linear depth of the geometry (defined as distance to the camera plane) and perform a normalization:

def process_depth(self, x):
    mask = x.r > 0.001
    alpha = mask.float()
    if not mask.any():
        n = x * 0
    else:
        v = x[mask]
        n = (x - v.min()) / (v.max() - v.min() + 1e-9)
    return float4(n, n, n, alpha)
staymylove commented 7 months ago

Thanks for your reply. Actually, I want to ask when I use depth ControlNet, my input should be 6 multiviews depth image or just single depth image? example is : image

But I want to ask single image + single depth image like the depth image below can get the same output? image

eliphatfs commented 7 months ago

No you'll have to provide 6 images.

eliphatfs commented 7 months ago

It may be possible to train another ControlNet on the contioning reference attention branch, but the released version is not about that.

kpister commented 7 months ago

I'm confused about this. Isn't the purpose of this project to generate the 6 image views? If I had depth estimates of the 6 views, why do I need to synth the views again?

Perhaps I am misunderstanding. Thank you for your work and patience.

eliphatfs commented 7 months ago

Yes; but for example, you have a 'white model' without any textures, then the controlnet can help you generate texture on the model. That's the point of the released ControlNet.