About the dataset properties

ZhanYang-nwpu / Mono3DVG

[AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024

22 stars 1 forks source link

"Box_2D": "[548.49609375, 171.26625061035156, 572.6758422851562, 194.19029235839844]", "Box_3D": "[-2.660233497619629, 0.07851047068834305, 48.222618103027344, 1.559999942779541, 1.4800000190734863, 3.619999885559082]", "label_2": "['Car', 0.0, 2.0, -1.55, 548.0, 171.33, 572.4, 194.42, 1.48, 1.56, 3.62, -2.72, 0.82, 48.22, -1.62]"

I observed a difference in the number of decimal places between the Box_2D and Box_3D attributes and the labelled values in label_2. Why didn't just use the labels given by Kitti for the Box_2D and Box_3D attributes? Furthermore, how are these exact values for the Box_2D and Box_3D properties obtained?
"Box_2D": "[548.49609375, 171.26625061035156, 572.6758422851562, 194.19029235839844]",
"Box_3D": "[-2.660233497619629, 0.07851047068834305, 48.222618103027344, 1.559999942779541, 1.4800000190734863, 3.619999885559082]",
"label_2": "['Car', 0.0, 2.0, -1.55, 548.0, 171.33, 572.4, 194.42, 1.48, 1.56, 3.62, -2.72, 0.82, 48.22, -1.62]"

The difference is because it is obtained by different coordinate system transformations. Such as the camera coordinate system, the image coordinate system, and the world coordinate system.

ZhanYang-nwpu / Mono3DVG

About the dataset properties #10