RookieXwc / GPICTURE

Apache License 2.0
59 stars 8 forks source link

Mutual Information-Driven Self-supervised Point Cloud Pre-training


Weichen Xu, Tianhao Fu, Jian Cao, Xinyu Zhao, Xinxin Xu, Xixin Cao, Xing Zhang   
Peking University, Beijing 100871, China   

:books:Outline

:sparkles: Location of Key Codes

In pcdet/models/dense_heads/pretrain_head_3D_seal.py, we provide the implementations of High-level Voxel Feature Generation Module, which involves the processes of data extraction, voxelization, as well as the computation of the target and loss.

In pcdet/models/backbones_3d/I2Mask.py and pcdet/models/backbones_3d/dsvt_backbone_mae.py, we provide the implementations of Inter-class and Intra-class Discrimination-guided Masking (I$^2$Mask).

In pcdet/utils/cka_alhpa.pyand pcdet/models/dense_heads/pretrain_head_3D_seal.py L172, we provide the implementations of CKA-guided Hierarchical Reconstruction.

In pcdet/models/dense_heads/pretrain_head_3D_seal.py L167 def differential_gated_progressive_learning, we provide the implementations of Differential-gated Progressive Learning.

We provide all the configuration files in the paper and appendix in tools/cfgs/gpicture_models/.

πŸ‘† BACK to Table of Contents -->

:car: Main Results

Pre-training

Waymo

Model Pre-train Fraction Pre-train model Log
GPICTURE (DSVT) 20% ckpt Log
GPICTURE (DSVT) 100% ckpt Log

nuScenes

Model Pre-train model Log
GPICTURE (DSVT) ckpt Log

Fine-tuning

3D Object Detection (on Waymo validation)

Model Pre-train Fraction mAP/H_L2 Veh_L2 Ped_L2 Cyc_L2 ckpt Log
DSVT (GPICTURE) 20% 73.84/71.75 71.55/71.22 75.99/70.61 73.98/73.42 ckpt Log
DSVT (GPICTURE) 100% 75.55/73.13 73.38/72.87 77.52/72.01 75.75/74.51 ckpt Log

3D Object Detection (on nuScenes validation)

Model mAP NDS mATE mASE mAOE mAVE mAAE ckpt Log
DSVT (GPICTURE) 68.6 73.0 25.5 23.8 25.8 20.7 17.4 ckpt Log

3D Semantic Segmentation (on nuScenes validation)

Model mIoU Bicycle Bus Car Motorcycle Pedestrian Trailer Truck ckpt Log
Cylinder3D-SST (GPICTURE) 79.7 43.6 94.8 96.5 81.0 84.4 65.8 87.7 ckpt Log

Occupancy Prediction (on nuScenes OpenOccupancy validation)

Model mIoU Bicycle Bus Car Motorcycle Pedestrian Trailer Truck ckpt Log
DSVT (GPICTURE) 18.8 8.4 16.2 21.1 7.9 12.8 15.9 16.3 ckpt Log

πŸ‘† BACK to Table of Contents -->

πŸƒβ€β™‚οΈGetting Start

⬇️1. Download Weights of MinkUNet (Res16UNet34C) Pre-trained by Seal

[youquanl/Segment-Any-Point-Cloud: NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models (github.com)](https://github.com/youquanl/Segment-Any-Point-Cloud)

After downloading, please put it into project path

πŸ‘† BACK to Table of Contents -->

βš’οΈ2. Prepare Dataset

Waymo:

1.Download the Waymo dataset from the official Waymo website, and make sure to download version 1.2.0 of Perception Dataset.

2.Prepare the directory as follows:

GPICTURE
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ waymo
β”‚   β”‚   │── ImageSets
β”‚   β”‚   │── raw_data
β”‚   β”‚   β”‚   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
β”œβ”€β”€ pcdet
β”œβ”€β”€ tools

3.Prepare the environment and install waymo-open-dataset:

pip install waymo-open-dataset-tf-2-5-0

4.Generate the complete dataset. It require approximately 1T disk and 100G RAM.

# only for single-frame setting
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos \
    --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml

# for single-frame or multi-frame setting
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset_multiframe.yaml
# Ignore 'CUDA_ERROR_NO_DEVICE' error as this process does not require GPU.

nuScenes:

1.Prepare the trainval dataset from nuScenes and prepare the directory as follows:

GPICTURE
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ nuscenes
β”‚   β”‚   │── v1.0-trainval
β”‚   β”‚   β”‚   │── samples
β”‚   β”‚   β”‚   │── sweeps
β”‚   β”‚   β”‚   │── maps
β”‚   β”‚   β”‚   │── v1.0-trainval  
β”œβ”€β”€ pcdet
β”œβ”€β”€ tools

2.Prepare the environment and install nuscenes-devkit:

pip install nuscenes-devkit==1.0.5

3.Generate the complete dataset.

# for lidar-only setting
python -m pcdet.datasets.nuscenes.nuscenes_dataset --func create_nuscenes_infos --cfg_file tools/cfgs/dataset_configs/nuscenes_dataset.yaml --version v1.0-trainval

nuScenes Lidarseg:

1.Download the annotation files from nuScenes and prepare the directory as follows:

GPICTURE
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ nuscenes
β”‚   β”‚   │── v1.0-trainval
β”‚   β”‚   β”‚   │── samples
β”‚   β”‚   β”‚   │── sweeps
β”‚   β”‚   β”‚   │── maps
β”‚   β”‚   β”‚   │── v1.0-trainval  
β”‚   β”‚   β”‚   β”‚   │── lidarseg.json
β”‚   β”‚   β”‚   β”‚   │── category.json
β”‚   β”‚   │── lidarseg
β”‚   β”‚   β”‚   │── v1.0-trainval  
β”œβ”€β”€ pcdet
β”œβ”€β”€ tools

nuScenes OpenOccupancy:

1.Download the annotation files from OpenOccupancy and prepare the directory as follows:

GPICTURE
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ nuscenes
β”‚   β”‚   │── v1.0-trainval
β”‚   β”‚   β”‚   │── samples
β”‚   β”‚   β”‚   │── sweeps
β”‚   β”‚   β”‚   │── maps
β”‚   β”‚   β”‚   │── v1.0-trainval  
β”‚   β”‚   β”‚   β”‚   │── lidarseg.json
β”‚   β”‚   β”‚   β”‚   │── category.json
β”‚   │── nuScenes-Occupancy
β”œβ”€β”€ pcdet
β”œβ”€β”€ tools

2.Prepare the environment:

conda install -c omgarcia gcc-6 # gcc-6.2
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

# Install mmdet3d from source code.
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install

# Install occupancy pooling.
git clone https://github.com/JeffWang987/OpenOccupancy.git
cd OpenOccupancy
export PYTHONPATH=β€œ.”
python setup.py develop

πŸ‘† BACK to Table of Contents -->

βš’οΈ3. Prepare the Environment

  1. create environment and install pytorch
conda create --name gpicture python=3.8
conda activate gpicture
# install pytorch
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.1 -c pytorch -c conda-forge
# Verify if pytorch is installed
import torch 
print(torch.cuda.is_available()) # If normal, return "True"
import torch    # If normal, remain silent
a = torch.Tensor([1.])    # If normal, remain silent
a.cuda()    # If normal, return"tensor([ 1.], device='cuda:0')"
from torch.backends import cudnn # If normal, remain silent
cudnn.is_acceptable(a.cuda())    # If normal, return "True"

2.install OpenPCDet

# install spconv
pip install spconv-cu111
# install requirements
pip install -r requirements.txt
# setup
python setup.py develop

3.install other packages

# install other packages
pip install torch_scatter
pip install nuscenes-devkit==1.0.5
pip install open3d

# install the Python package for evaluating the Waymo dataset
pip install waymo-open-dataset-tf-2-5-0==1.4.1

# pay attention to specific package versions.
pip install pandas==1.4.3
pip install matplotlib==3.6.2
pip install scikit-image==0.19.3
pip install async-lru==1.0.3

# install CUDA extensions
cd common_ops
pip install .

4.install MinkowskiEngine

# install MinkowskiEngine
pip install ninja
conda install openblas-devel -c anaconda
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

πŸ‘† BACK to Table of Contents -->

βš’οΈ4. Prepare the Seal Feature for the Entire Dataset Offline

1.Prepare the coords and feats inputs.

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_waymo_ssl_seal_generate_input.yaml
# or
python train.py --cfg_file cfgs/gpicture_models/gpicture_nuscenes_ssl_seal_generate_input.yaml

2.Utilize the MinkUNet (Res16UNet34C) pre-trained by Seal to generate the Seal features.

cd tools/
python prepare_seal_output.py

πŸ‘† BACK to Table of Contents -->

:rocket:5. Run the Code

We provide the configuration files in the paper and appendix in tools/cfgs/gpicture_models/.

Pre-training

Waymo

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_waymo_ssl_seal_decoder_mask.yaml

nuScenes

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_nuscenes_ssl_seal_decoder_mask.yaml

Fine-tuning

3D Object Detection on Waymo:

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_waymo_detection.yaml --pretrained_model /path/of/pretrain/model.pth

3D Object Detection on nuScenes:

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_nuscenes_detection.yaml --pretrained_model /path/of/pretrain/model.pth

3D Semantic Segmentation:

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_nuscenes_segmentation.yaml --pretrained_model /path/of/pretrain/model.pth

Occupancy Prediction:

cd tools/
python train.py --cfg_file cfgs/gpicture_models/gpicture_nuscenes_occupancy.yaml --pretrained_model /path/of/pretrain/model.pth

πŸ‘† BACK to Table of Contents -->