ZzZZCHS / Chat-Scene

Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
MIT License
90 stars 6 forks source link

Chat-Scene

We build a multi-modal large language model for 3D scene understanding, excelling in tasks such as 3D grounding, captioning, and question answering.

🔥 Ranked 1st on the ScanRefer Benchmark (Sept. 2024) ![alt text](assets/scanrefer_benchmark_results.png) [leaderboard link](https://kaldir.vc.in.tum.de/scanrefer_benchmark/benchmark_localization)
🔥 Ranked 1st on the Scan2Cap Benchmark (Sept. 2024) ![alt text](assets/scan2cap_benchmark_results.png) [leaderboard link](https://kaldir.vc.in.tum.de/scanrefer_benchmark/benchmark_captioning)

News

[2024.09] 🔥 Chat-Scene has been accepted by NeurIPS 2024! [paper]

[2024.08] 🔥 We release Chat-Scene, capable of processing both 3D point clouds and 2D multi-view images for improved 3D scene understanding, leading to significant advancements in grounding and captioning performance.

[2024.04] We release a refined implementation (v2.1), which achieves better performance on grounding, captioning, and QA tasks. The code is available in branch v2.1.

[2023.12] We release Chat-3D v2 [paper], introducing object identifiers for enhanced object referencing and grounding in 3D scenes. The original code is available in branch v2.0.

[2023.08] We release Chat-3D [paper] [code], an LLM-based dialogue system for 3D scenes.

🔥 Chat-Scene vs Chat-3D v2

🔨 Preparation

🤖 Training and Inference

📄 Citation

If you find this project useful in your research, please consider cite:

@article{huang2023chat,
  title={Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers},
  author={Huang, Haifeng and Wang, Zehan and Huang, Rongjie and Liu, Luping and Cheng, Xize and Zhao, Yang and Jin, Tao and Zhao, Zhou},
  journal={arXiv preprint arXiv:2312.08168},
  year={2023}
}
@article{wang2023chat,
  title={Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes},
  author={Wang, Zehan and Huang, Haifeng and Zhao, Yang and Zhang, Ziang and Zhao, Zhou},
  journal={arXiv preprint arXiv:2308.08769},
  year={2023}
}

Stay tuned for our project. 🔥

If you have any questions or suggestions, feel free to drop us an email (huanghaifeng@zju.edu.cn, wangzehan01@zju.edu.cn) or open an issue.

😊 Acknowledgement

Thanks to the open source of the following projects:

(Multi-modal) LLMs: LLaMA, Vicuna, VideoChat, LEO

3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer

Detectors: PointGroup, Mask3D, DEVA

Representations: ULIP, Uni3D, DINOv2

3D Models: vil3dref, OpenScene