TRAILab / CaDDN

Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021 Oral)
Apache License 2.0
359 stars 62 forks source link

Question about the transformation from grid coordinates to image coordinates? #51

Closed rockywind closed 3 years ago

rockywind commented 3 years ago

Thanks for your help on the previous issue. The depth map is downsampling 4x, but the intrinsic is the same as the original one. I think the transformation is below that. image

codyreading commented 3 years ago

When performing the transformation, we utilize normalized coordinates, which are essentially coordinates normalized between [-1, 1]. That way, the transformation works regardless of scale.

What ends up happening exactly is we generate a set of points in the center of each voxel, which we call the sampling grid. We project each point in the sampling grid from 3D space into the camera frustum space of full size (not downsampled by 4). Then, we normalize each point between [-1, 1], forming our frustum sampling grid. We then use the grid_sample function to sample from our frustum grid, which accepts normalized coordinates. Since coordinates are normalized between [-1, 1], the functionality will be the same regardless of scale of image features/depth map.

rockywind commented 3 years ago

Thanks a lot! @codyreading