CLIP-goes-3D

Official code for the paper "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"

This repository includes the pre-trained models, evaluation and training codes for pre-training, zero-shot, and fine-tuning experiments of CG3D. It is built on the Point-BERT codebase. Please see the end of this document for a full list of code references.

To-Do:

[x] Setup
[x] Model weights from pre-training
[x] Model weights from fine-tuning
[x] Pre-training CG3D
[x] Zero-shot inference
[x] Fine-tune inference
[x] Fine-tuning CG3D
[ ] Retrieval

Environment set-up

The known working environment configuration is

python 3.9
pytorch 1.12
CUDA 11.6

Install the conda virtual environment using the provided .yml file.
```
conda env create -f environment.yml 
```
(OR)

Install dependencies manually.

conda create -n cg3d
conda activate cg3d

pip install -r requirements.txt

conda install -c anaconda scikit-image scikit-learn scipy

pip install git+https://github.com/openai/CLIP.git

pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

cd ./extensions/chamfer_dist
python setup.py develop

Build modified timm from scratch

cd ./models/SLIP/pytorch-image-models
pip install -e .

Install PointNet ops

cd third_party/Pointnet2_PyTorch
pip install -e .
pip install pointnet2_ops_lib/.

Install PyGeM

cd third_party/PyGeM
python setup.py install

Dataset set-up

Download point cloud datasets for pre-training and fine-tuning.

Download ShapeNetCore v2.
Download ModelNet
Download ScanObjectNN

Save and unzip the above datasets.

Render views of textured CAD models of ShapeNet using this repository. We use a scale of 0.7 and 5 total views.
The data should be organized as

├── data (this may be wherever you choose)
│   ├── modelnet40_normal_resampled
│   │   │── modelnet10/40_shape_names.txt
│   │   │── modelnet10/40_train/test.txt 
│   │   │── airplane
│   │   │── ...
│   │   │── laptop 
│   ├── ShapeNet55
│   │   │── train.txt
│   │   │── test.txt
│   │   │── shapenet_pc
│   │   │   |── 03211117-62ac1e4559205e24f9702e673573a443.npy
│   │   │   |── ...
│   ├── shapenet_render
│   │   │── train_img.txt
│   │   │── val_img.txt
│   │   │── shape_names.txt
│   │   │── taxonomy.json
│   │   │── camera
│   │   │── img
│   │   │   |── 02691156
│   │   │   |── ...
│   ├── ScanObjectNN
│   │   │── main_split
│   │   │── ...

1) Model weights

a) Pre-trained CG3D weights

Download SLIP model weights from here.

PointTransformer

No. of points	Model file	Task	Configuration file
1024	download	Pre-training	link
8192	download	Pre-training	link

PointMLP

No. of points	Model file	Task	Configuration file
1024	download	Pre-training	link
8192	download	Pre-training	link

Test Zero-Shot performance

    python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}

b) Fine-tuning model weights

PointTransformer

Dataset	Model Weights	TFBoard
ScanObjectNN	download	link
ModelNet	download	link

PointMLP

Dataset	Model Weights	TFBoard
ScanObjectNN	download	link
ModelNet	download	link

2) Training CG3D

a) Pre-training

Change data paths to relevant locations in cfgs/dataset_configs/

Pre-train PointTransformer on ShapeNet under the CG3D framework:

python main.py  --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointTransformerVPT.yaml  --pretrain    --out_dir {OUTPUT DIR PATH}  --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}

Pre-train PointMLP on ShapeNet under the CG3D framework:

python main.py  --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointMLP_VPT.yaml  --pretrain    --out_dir {OUTPUT DIR PATH}  --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}

Zero-Shot Inference

  python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}

Fine-tuning Inference

 python eval.py --config  cfgs/{ModelNet_models,ScanObjectNN_models}/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --npoints {1024,8192}

b) Fine-tuning

Finetuning PointTransformer:

 CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointTransformer.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}

Finetuning PointMLP:

 CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointMLP.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}

Please change the .yml files to change the finetuning dataset from ModelNet to ScanObjectNN, etc.

References

Citation

 @article{hegde2023clip,
  title={CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition},
  author={Hegde, Deepti and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
  journal={arXiv preprint arXiv:2303.11313},
  year={2023}
}

deeptibhegde / CLIP-goes-3D

readme

CLIP-goes-3D

Environment set-up

Dataset set-up

1) Model weights

a) Pre-trained CG3D weights

PointTransformer

PointMLP

Test Zero-Shot performance

b) Fine-tuning model weights

PointTransformer

PointMLP

2) Training CG3D

a) Pre-training

Zero-Shot Inference

Fine-tuning Inference

b) Fine-tuning

Finetuning PointTransformer:

Finetuning PointMLP:

References

Citation