WeitaiKang / Robin3D

Improving 3D Large Language Model via Robust Instruction Tuning
43 stars 2 forks source link

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

We present Robin3D, a state-of-the-art 3D Large Language Model trained on large-scale instruction-following data generated by our novel Robust Instruction Generation (RIG) data engine. To handle our RIG-generated complex data, our Robin3D further enhances its spatial understanding by Relation-Augmented Projector and improves the object referring and grounding ability by ID-Feature Bonding.

News

[2024.09] We release Robin3D [paper][code], a new SOTA 3D LLM for 3D scenes.

🔥 Robin3D vs Previous Methods

performance

🔨 Preparation

🤖 Training and Inference

📄 Citation

Stay tuned for our project. 🔥

If you have any questions or suggestions, feel free to drop us an email (wkang11@hawk.iit.edu) or open an issue.

😊 Acknowledgement

Thanks to the open source of the following projects:

LLMs: LLaMA, Vicuna,

3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer, Grounded-3DLLM, Chat-Scene

Detectors: Mask3D,

Representations: Uni3D, DINOv2

3D Models: OpenScene