Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
https://ll3da.github.io/
MIT License
221 stars 9 forks source link

How to get grounding data? #25

Open Germany321 opened 1 month ago

Germany321 commented 1 month ago

Thanks for sharing the work. I notice that the model can output coordinates of the 3D bounding boxes throught numerical values. How to access this data related to 3D grounding tasks?

ch3cook-fdu commented 1 month ago

In our paper, we evaluate LL3DA on several tasks, ranging from captioning, question answering, dialogue, embodied planning, and even open-vocabulary detection. However, our method does not support 3D visual grounding officially.

You are welcome to make certain modifications for this feature. You can use data from ScanRefer, Nr3D for 3D visual grounding tasks.