Pointcept / GPT4Point

[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
https://gpt4point.github.io/
MIT License
304 stars 19 forks source link
3d-generation llm multimodal-learning

[CVPR2024] GPT4Point : A Unified Framework for Point-Language Understanding and Generation

🔥 News

🔥 2024/04/27: We have modified the point encoder section, and now evaluation is more functional, although the training section still needs modification.

🔥 2024/04/13: We release the GPT4Point v1.0, including training and 3D captioning evluation code.

🔥 2024/04/05: Our paper GPT4Point is selected as CVPR'24 Highlight 2.84% (324/11532) !

🔥 2024/02/27: Our paper GPT4Point is accepted by CVPR'24!

🔥 2024/01/19: We release the Objaverse-XL (Point Cloud Format) Download and Extraction way.

🔥 2023/12/05: The paper GPT4Point (arxiv) has been released, we unified the Point-language Understanding and Generation.

🔥 2023/08/13: Two-stage Pre-training code of PointBLIP has been released.

🔥 2023/08/13: Part of datasets used and result files has been uploaded.

🏠 Overview

This project presents GPT4Point , a 3D multi-modality model that aligns 3D point clouds with language. More details are shown in project page.

🧭 Version

🔧 Installation

  1. (Optional) Creating conda environment
conda create -n gpt4point python=3.8
conda activate gpt4point
  1. install from PyPI

    pip install salesforce-lavis
  2. Or, for development, you may build from source

git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .

📦 Data Preparation

  1. Annotations: All annotations will be downloaded automaticly through hugging_face.

  2. Point Cloud: You can download the Cap3D point cloud dataset through the Google Drive Link. You should unzip these 10 tar.gz files and then put them together. and the all folder strucure is:

GPT4Point
├── data
│   ├── cap3d
│   │   ├── points
│   │   │    ├── Cap3D_pcs_8192_xyz_w_color
│   │   │    │    ├── <point cloud id>.pkl
│   │   │    │    ├── ...
│   │   │    │    ├── <point cloud id>.pkl
│   │   ├── annotations
│   │   │    ├── cap3d_caption_train.json
│   │   │    ├── cap3d_caption_val.json
│   │   │    ├── cap3d_real_and_chatgpt_caption_test.json
│   │   │    ├── cap3d_real_and_chatgpt_caption_test_gt.json (for evaluation)

🚆 Training

  1. For stage 1 training:

    python -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage1_cap3d.yaml
  2. For stage 2 training:

    python -m torch.distributed.run --master_port=32339 --nproc_per_node=4 train.py --cfg-path lavis/projects/gpt4point/train/pretrain_stage2_cap3d_opt2.7b.yaml

🏁 Evaluation

python -m torch.distributed.run --master_port=32239 --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/gpt4point/eval/captioning3d_cap3d_opt2.7b_eval.yaml

📦 Point Dataset and Data Annotation Engine (Optional)

Objaverse-XL Point Dataset Download Way

Note that you should cd in the Objaverse-xl_Download directory.

cd ./Objaverse-xl_Download

Then please see the folder Objaverse-xl_Download for details.

Objaverse-XL Point Cloud Data Generation

Please see the Extract_Pointcloud for details.

📝 TODO List

Dataset and Data Engine

🔗 Citation

If you find our work helpful, please cite:

@inproceedings{GPT4Point,
  title={GPT4Point: A Unified Framework for Point-Language Understanding and Generation},
  author={Zhangyang Qi and Ye Fang and Zeyi Sun and Xiaoyang Wu and Tong Wu and Jiaqi Wang and Dahua Lin and Hengshuang Zhao},
  booktitle={CVPR},
  year={2024},
}

📄 License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

📚 Related Work

Together, Let's make LLM for 3D great!