GAP-LAB-CUHK-SZ / Total3DUnderstanding

Implementation of CVPR'20 Oral: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
MIT License
415 stars 50 forks source link

Issue: Interpreting the attributes of 3D Bounding box #20

Open Suraj520 opened 3 years ago

Suraj520 commented 3 years ago

Hi @yinyunie, Thanks for your amazing research and contribution to the Spatial understanding domain. I am currently stuck at a doubt while interpreting the output of the source code in correlation to the output discussed in the paper. The description of which has been summarized below. It'd be great if you can respond to it.

As far as I can see the output of the 3D object detection Network is given as bdb_3d.mat which seems like a dictionary with the following keys for each instance of object detected by the 2D object detection network. 1.'basis' 2.'coeffs' 3.'centroid' 4.'classid'

The basis seems to be the Rotational matrix of the bounding box (R 3*3) from which we can get the Euler angles in the closed subset of -pi to pi, what does the coeffs and centroid in the mat file signify ?

Please refer the cropped section 3.1 from the research paper attached below which says any 3D bounding box in the world coordinate system is defined by C,s and theta.

Which of the aforementioned keys in bdb_3d.mat correspond to C abbreviated as 3D Center and s abbreviated as spatial size ?

Thanks, anticipating a response.

3DObjectDetection

alando46 commented 3 years ago

coeffs = distance from each vertex to 3d center (centroid). Each detection and the entire layout detection has 1 R3 coefficient. Coefficient represents spatial size of bounding box.

centroid = predicted <i,j,k> center of each detection. Thus C (centroid) represents 3D center.

As you mentioned, θ represents the rotation angle (of the world coordinate system) used to create the basis.