TencentARC / InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Apache License 2.0
2.98k stars 306 forks source link

Question on depth scaling #71

Closed Kev1MSL closed 4 months ago

Kev1MSL commented 4 months ago

Hi! I am wondering why are you scaling the depth in the data preparation code, but then in the training code you are both dividing the ground truth and the output depths by the max depth from the ground truth. Could you please provide some explanation because the output depth from the model is neither normalized nor scaled by any value. Also I was wondering if you were also expecting a reversed depth map for the training dataset (i.e. darker colors mean closer to the camera, and light colors mean further)

Kev1MSL commented 4 months ago

With this little modification of the NeuralRender render_mesh function, I was able to get a positive depth map that gave me satisfying results:

depth = gb_feat[..., -2:-1]
# Transforming the negative depth into a positive one
depth -= depth.min()
depth = torch.lerp(
torch.zeros_like(depth), depth / depth.max(), hard_mask.float()
)

Not sure if it is a breaking change if I remove the - at this line

bluestyle97 commented 4 months ago

We scale the depth because the depth maps are rendered as PNG images in the value range of [0, 255], representing the depth range of [MIN_DEPTH, MAX_DEPTH], where MIN_DEPTH=0, MAX_DEPTH=depth_scale. Since PNG files can only store values between 0 and 255, we must scale the depth to fit this range during rendering and scale it back during data loading. If we render the depths in OPEN_EXR format instead, we can store absolute depth values directly so that the depth scaling is no longer required. However, storing depth maps in OPEN_EXR format will significantly increase the storage overhead.

The depth maps rendered by NeRF during training are absolute values, we scale them to [0, 1] by dividing the max depth for visualization.

Your modification should be identical to our original code, the only difference is that you reverse the sign of negative depth at a different place.