OpenRobotLab / EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
https://tai-wang.github.io/embodiedscan/
Apache License 2.0
463 stars 34 forks source link

[Docs] Thanks for your awesome work! After filling out the questionnaire, how long does it take to receive the data download link? #13

Closed Hoyyyaard closed 6 months ago

Hoyyyaard commented 6 months ago

Branch

main branch https://mmdetection3d.readthedocs.io/en/latest/

📚 The doc issue

Thanks for your awesome work! After filling out the questionnaire, how long does it take to receive the data download link?

Suggest a potential alternative/fix

No response

RuiyuanLyu commented 6 months ago

We typically check once every 1-2 days. Thank you for your patience.

Hoyyyaard commented 6 months ago

Can you provide some detailed description each key of the json file : embodiedscan_val_mini_vg.json And where can I find the corresponding bbox of the target object? Thanks for your reply!

ZCMax commented 6 months ago

The vg json file consists of a list of dict, each dict represents the target object description and related information. For example: {"scan_id": "scene0329_00", "target_id": 24, "distractor_ids": [6, 40, 42], "text": "find the door that is far away from the computer", "target": "door", "anchors": ["computer"], "anchor_ids": [36], "tokens_positive": [[9, 13]]},

'target_id' repesents the target object id in the scan.

'distractor_ids' represents the object ids which have the same category with the target object in the scan.

'text' represents the target object description.

'tokens_positive' repsents the target object categroy ('room' in the example ) position in the text prompt.

Since for each info, it includes the 'target_id', which can be associated with the the 3D box information in corresponding pkl file used in the detection / occupancy task, as shown in https://github.com/OpenRobotLab/EmbodiedScan/blob/3cc91cb87903137ad4eb46b01b7ddd60549e4f99/embodiedscan/datasets/mv_3dvg_dataset.py#L339.

Hoyyyaard commented 6 months ago

How can I get the corresponding instance poinclouds and raw pointclouds of an episode in MultiView3DGroundingDataset?

ZCMax commented 6 months ago

How can I get the corresponding instance poinclouds and raw pointclouds of an episode in MultiView3DGroundingDataset?

Actually we can obtain the the point clouds by back-projecting the image pixels to the 3D space, and one possible solution to obtain the corresponding instance point clouds is to use to find the points in the target instance 3D bboxes.

Hoyyyaard commented 6 months ago

Sorry I might not have made myself clear! What I'm asking is how to find the scene point clouds corresponding to an episode, such as the scene point clouds you've collected from ScanNet or MP3D, is it in the folder named embodiedscan_occupancy?

Tai-Wang commented 6 months ago

You can find the scan name by printing the scan_id or related keys. We keep the original scan names from these three datasets and you can distinguish them easily.

Hoyyyaard commented 6 months ago

So the cloud you used was from the original data set? And you didn't use the field cloud ply file you collected? The point cloud data of these objects, for example, many objects in your dataset are not labeled with raw scene point clouds. How can I obtain the point cloud data of these objects? Thanks.

Tai-Wang commented 6 months ago
  1. Apart from the annotation file, you need to download the raw data from the official websites of the mentioned three datasets according to our data preparation documentation.
  2. Some objects that are not annotated in the original datasets do not mean you can not obtain their point clouds. You can simply derive the point clouds in our annotated 3D boxes to get their point clouds. However, it can sometimes be inaccurate due to inaccurate camera poses. There can be a gap between the converted depth point clouds and the reconstructed point clouds provided by the ply files.
  3. You can also concatenate the point clouds converted from multi-view depth maps and obtain the objects' point clouds using our annotated boxes. (although it can also be noisy due to inaccurate camera poses and depth measurement)