SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

Depth map size ratios for depth ControlNet #78

Open zaiisao opened 1 month ago

zaiisao commented 1 month ago

Hello, Congratulations on this research; I have been having a very good time in experimenting with this pipeline. However, I have a question.

I loaded a mesh and created six depth maps to align in a similar 3x2 grid as shown in the paper. Of course, because each rendered depth map will usually differ in size, I ended up adding padding to each depth map to match the size of the largest of the six images. However, this results in an image grid where the size of each depth map varies quite noticeably; please see the below image. The padded areas are in black, and the depth maps from -20 degree elevations are considerably smaller than the depth maps from 30 degree elevations. 0000_cropped_depth_grid

However, the depth maps shown in the Zero123++ paper seem to all be very similar in size with each other. This made me want to know how the sizing of each depth map was determined during training. Were the depth maps simply resized to match the size of the largest depth map image (as opposed to filling the empty space with padding)?

Thanks in advance.

eliphatfs commented 1 month ago

The size should be the resolution of your render output and can be fixed to a same value. I am not sure why you have different-sized depths.