How can I find the corresponding image and camera parameters according to the 3d bbox？

OpenRobotLab / EmbodiedScan

[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

https://tai-wang.github.io/embodiedscan/

Apache License 2.0

395 stars 26 forks source link

How can I find the corresponding image and camera parameters according to the 3d bbox？ #33

Closed Hoyyyaard closed 3 months ago

Hoyyyaard commented 3 months ago

Branch

main branch https://mmdetection3d.readthedocs.io/en/latest/

📚 The doc issue

How can I find the corresponding image and camera parameters according to the 3d bbox？ Thanks.

Suggest a potential alternative/fix

No response

Tai-Wang commented 3 months ago

We provide the corresponding image information (image path and intrinsic/extrinsic parameters) in the pkl info if you have applied to access our dataset. Please feel free to clarify your question if I do not understand it accurately.

mxh1999 commented 3 months ago

In fact, there is no concept of "a corresponding image" when given an arbitrary 3D box, because it can appear in more than one image. If you want to know which ground-truth boxes are included in each picture, you can see 'visible_instance_ids' in the pkl info.

Hoyyyaard commented 3 months ago

I found the image corresponding to a certain bounding box in a scene. I used the depth map to map the image to a point cloud. However, due to the inaccuracy of the depth map, some small objects in the image cannot be accurately projected into the point cloud. How can I solve this problem?

Example: scan: scannet/scene0204_00 rgb: scannet/posed_images/scene0204_00/00660.jpg bbox object: socket

Tai-Wang commented 3 months ago

It's hard to avoid such problems completely because the pose provided by the original datasets is also estimated via SLAM algorithms. There can be errors for specific frames and the projection can make the annotation in 2D and 3D inconsistent. Actually, it is also a problem in practice and we can just acknowledge it as a kind of noise when doing ego-centric perception.