WeakM3D

Introduction

This is the PyTorch implementation of the paper WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection, In ICLR'22, Liang Peng, Senbo Yan, Boxi Wu, Zheng Yang, Xiaofei He, and Deng Cai.

[paper]

Abstract

Monocular 3D object detection is one of the most challenging tasks in 3D scene understanding. Due to the ill-posed nature of monocular imagery, existing monocular 3D detection methods highly rely on training with the manually annotated 3D box labels on the LiDAR point clouds. This annotation process is very laborious and expensive. To dispense with the reliance on 3D box labels, in this paper we explore the weakly supervised monocular 3D detection. Specifically, we first detect 2D boxes on the image. Then, we adopt the generated 2D boxes to select corresponding RoI LiDAR points as the weak supervision. Eventually, we adopt a network to predict 3D boxes which can tightly align with associated RoI LiDAR points. This network is learned by minimizing our newly-proposed 3D alignment loss between the 3D box estimates and the corresponding RoI LiDAR points. We will illustrate the potential challenges of the above learning problem and resolve these challenges by introducing several effective designs into our method.

Installation

Installation Steps

a. Clone this repository.

git clone https://github.com/SPengLiang/WeakM3D

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install torch==1.1.0 torchvision==0.3.0 loguru sklearn opencv tqdm

You can create a conda environment or use docker.

Getting Started

Dataset Preparation

Currently we provide the dataloader of KITTI dataset. You can download the entire raw KITTI dataset by running:

wget -i ./data/kitti/data_file/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..
ln -s kitti_data ./data/kitti/raw_data

Warning: it weighs about 175GB, so make sure you have enough space to unzip.

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows:

WeakM3D_PATH
├── data
│   ├── kitti
│   │   │── raw_data
│   │   │── KITTI3D
|   │   │   │── training
|   │   │   │   ├──calib & velodyne & label_2 & image_2 & rgb_detections
|   │   │   │── testing
|   │   │   │   ├──calib & velodyne & image_2
├── config
├── ...

The raw_data refers to the soft link to the raw KITTI dataset and the 2D rgb_detections are offline 2D box predictions, generated from F-PointNet. Here we provide them at: Google Drive.

You can also choose to link your KITTI dataset path by

KITTI_DATA_PATH=~/data/kitti_object
ln -s $KITTI_DATA_PATH/training/ ./data/kitti/KITTI3D/
ln -s $KITTI_DATA_PATH/testing/ ./data/kitti/KITTI3D/

Prepare KITTI raw dataset split for training:
```
python utils/prepare_kitti_raw_datafile.py
```
Generate the LiDAR RoI points by running the following command, which will take a while:
```
python utils/seg_img.py
python utils/save_lidar_RoI_points.py
```
To ease the usage, we provide the pre-generated files at: Google Drive

Training & Testing

Test and evaluate the pretrained models

CUDA_VISIBLE_DEVICES=0 python scripts/eval_infer.py --config ./config/resnet34_eval.yaml

Train a model

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --config ./config/resnet34_backbone.yaml

Pretrained Model

To ease the usage, we provide the pre-trained model at: Google Drive

Here we give the comparison.

under AP11 metrics:

Models	Backbone	Car@BEV IoU=0.7			Car@3D IoU=0.7
Models	Backbone	Easy	Mod	Hard	Easy	Mod	Hard
original paper	ResNet50	24.89	16.47	14.09	17.06	11.63	11.17
this repo	ResNet34	26.92	18.57	15.86	18.27	12.95	11.55

under AP40 metrics:

Models	Backbone	Car@BEV IoU=0.5			Car@3D IoU=0.5
Models	Backbone	Easy	Mod	Hard	Easy	Mod	Hard
original paper	ResNet50	58.20	38.02	30.17	50.16	29.94	23.11
this repo	ResNet34	60.72	40.32	31.34	53.28	33.30	25.76

Citation

@inproceedings{peng2021weakm3d,
  title={WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection},
  author={Peng, Liang and Yan, Senbo and Wu, Boxi and Yang, Zheng and He, Xiaofei and Cai, Deng},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

Acknowledgements

The code benefits from Monodepth2, F-PointNet and Second.

SPengLiang / WeakM3D

readme