We present Robin3D, a state-of-the-art 3D Large Language Model trained on large-scale instruction-following data generated by our novel Robust Instruction Generation (RIG) data engine. To handle our RIG-generated complex data, our Robin3D further enhances its spatial understanding by Relation-Augmented Projector and improves the object referring and grounding ability by ID-Feature Bonding.
[2024.09] We release Robin3D [paper][code], a new SOTA 3D LLM for 3D scenes.
Prepare the environment:
conda create -n robin3d python=3.9.17
conda activate robin3d
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
Download LLM backbone:
Annotations and extracted features:
Please follow the instructions in Chat-Scene's Preparation.
Stay tuned for our project. 🔥
If you have any questions or suggestions, feel free to drop us an email (wkang11@hawk.iit.edu
) or open an issue.
Thanks to the open source of the following projects:
3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer, Grounded-3DLLM, Chat-Scene
Detectors: Mask3D,
Representations: Uni3D, DINOv2
3D Models: OpenScene