Closed staymylove closed 6 months ago
I think in general you will need to know the geometry beforehand. After that you can render linear depth of the geometry (defined as distance to the camera plane) and perform a normalization:
def process_depth(self, x):
mask = x.r > 0.001
alpha = mask.float()
if not mask.any():
n = x * 0
else:
v = x[mask]
n = (x - v.min()) / (v.max() - v.min() + 1e-9)
return float4(n, n, n, alpha)
Thanks for your reply. Actually, I want to ask when I use depth ControlNet, my input should be 6 multiviews depth image or just single depth image?
example is :
But I want to ask single image + single depth image like the depth image below can get the same output?
No you'll have to provide 6 images.
It may be possible to train another ControlNet on the contioning reference attention branch, but the released version is not about that.
I'm confused about this. Isn't the purpose of this project to generate the 6 image views? If I had depth estimates of the 6 views, why do I need to synth the views again?
Perhaps I am misunderstanding. Thank you for your work and patience.
Yes; but for example, you have a 'white model' without any textures, then the controlnet can help you generate texture on the model. That's the point of the released ControlNet.
Dear author: I want to ask the format of data of depth ControlNet must be 6 subfigures? I think it is hard to get the multiview depth image.