deeptibhegde / CLIP-goes-3D

Official code release of "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"
https://jeya-maria-jose.github.io/cg3d-web/
215 stars 11 forks source link

CLIP-goes-3D

Official code for the paper "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"

arxiv / website

image

This repository includes the pre-trained models, evaluation and training codes for pre-training, zero-shot, and fine-tuning experiments of CG3D. It is built on the Point-BERT codebase. Please see the end of this document for a full list of code references.

To-Do:

Environment set-up

The known working environment configuration is

python 3.9
pytorch 1.12
CUDA 11.6
  1. Install the conda virtual environment using the provided .yml file.

    conda env create -f environment.yml 

    (OR)

  2. Install dependencies manually.

    conda create -n cg3d
    conda activate cg3d
    pip install -r requirements.txt
    
    conda install -c anaconda scikit-image scikit-learn scipy
    pip install git+https://github.com/openai/CLIP.git
    pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
    cd ./extensions/chamfer_dist
    python setup.py develop
  3. Build modified timm from scratch

    cd ./models/SLIP/pytorch-image-models
    pip install -e .
  4. Install PointNet ops

    cd third_party/Pointnet2_PyTorch
    pip install -e .
    pip install pointnet2_ops_lib/.
  5. Install PyGeM

    cd third_party/PyGeM
    python setup.py install

Dataset set-up

  1. Download point cloud datasets for pre-training and fine-tuning.

    Save and unzip the above datasets.

    1. Render views of textured CAD models of ShapeNet using this repository. We use a scale of 0.7 and 5 total views.

    2. The data should be organized as

    ├── data (this may be wherever you choose)
    │   ├── modelnet40_normal_resampled
    │   │   │── modelnet10/40_shape_names.txt
    │   │   │── modelnet10/40_train/test.txt 
    │   │   │── airplane
    │   │   │── ...
    │   │   │── laptop 
    │   ├── ShapeNet55
    │   │   │── train.txt
    │   │   │── test.txt
    │   │   │── shapenet_pc
    │   │   │   |── 03211117-62ac1e4559205e24f9702e673573a443.npy
    │   │   │   |── ...
    │   ├── shapenet_render
    │   │   │── train_img.txt
    │   │   │── val_img.txt
    │   │   │── shape_names.txt
    │   │   │── taxonomy.json
    │   │   │── camera
    │   │   │── img
    │   │   │   |── 02691156
    │   │   │   |── ...
    │   ├── ScanObjectNN
    │   │   │── main_split
    │   │   │── ...
    

1) Model weights

a) Pre-trained CG3D weights

Download SLIP model weights from here.

PointTransformer


No. of points Model file Task Configuration file
1024 download Pre-training link
8192 download Pre-training link

PointMLP


No. of points Model file Task Configuration file
1024 download Pre-training link
8192 download Pre-training link

Test Zero-Shot performance

    python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}

b) Fine-tuning model weights

PointTransformer

Dataset Model Weights TFBoard
ScanObjectNN download link
ModelNet download link

PointMLP

Dataset Model Weights TFBoard
ScanObjectNN download link
ModelNet download link

2) Training CG3D

a) Pre-training

Zero-Shot Inference

  python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}

Fine-tuning Inference

 python eval.py --config  cfgs/{ModelNet_models,ScanObjectNN_models}/{CONFIG} --exp_name {NAME FOR EXPT}  --ckpts {CKPT PATH} --npoints {1024,8192}

b) Fine-tuning

Finetuning PointTransformer:

 CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointTransformer.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}

Finetuning PointMLP:

 CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointMLP.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}

Please change the .yml files to change the finetuning dataset from ModelNet to ScanObjectNN, etc.

References

Citation

 @article{hegde2023clip,
  title={CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition},
  author={Hegde, Deepti and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
  journal={arXiv preprint arXiv:2303.11313},
  year={2023}
}