HENet

Paper | Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

This is the official implementation of HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras.

Update

2024/07/07 - HENet has been accepted to ECCV 2024. We will release the code in this repository.
2024/05/09 - We incorporated SparseBEV with HENet and achieved 1st place on the nuScenes leaderboard / Vision track.
2024/04/03 - We released our paper on arXiv.

Introduction

HENet is an end-to-end multi-task 3D perception framework. It reduces training costs through hybrid image encoding and mitigates multi-task conflicts through independent BEV feature encoding.

Demo

Visualization results of HENet and baselines on end-to-end multi-tasking. The proposed HENet estimates occluded objects better through long-term information and have more accurate predictions through high-resolution information.

Main Results

	mAP	NDS	mIoU	config	model
HENet	49.9	59.9	58.0

Getting Started

Environment

The code is tested in the following two environments:

cuda     12.1
pytorch  2.0.1+cu118 
GPU      A800, A40
(Need to manually comment out the cuda version check of pytorch)
(For a detailed package list, please refer to envs_list_cu121.txt)

cuda     11.3
pytorch  1.12.1+cu113
GPU      RTX8000, RTX3090, V100, P40
(For a detailed package list, please refer to envs_list_cu113.txt)

The most recommended installation steps are:

Create a Python environment. Install PyTorch corresponding to your machine's CUDA version;
Install mmcv corresponding to your PyTorch and CUDA version;
Install other dependencies of mmdet and install mmdet;
Install other dependencies of this project (Please change the spconv version in the requirements.txt to the CUDA version you are using) and setup this project;

python setup.py develop

Compile some operators manually.

cd mmdet3d/ops/csrc
python setup.py build_ext --inplace
cd ../deformattn
python setup.py build install

Install other dependencies of detectron2 and install detectron2;

cd detr2
python setup.py develop

Data Preparation

Please download nuScenes-v1.0-trainval and nuScenes-map-expansion-v1.3 at nuScenes.org and CVPR23-Occupancy/gts.tar.gz at CVPR2023-3D-Occupancy-Prediction.

If your folder structure is different from the following, you may need to change the corresponding paths in config files.

├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   │   ├── basemap
│   │   │   ├── expansion
│   │   │   ├── prediction
│   │   │   ├── *.png
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval

Prepare nuScenes data by running:

python tools/create_data_nuscenes_C.py

Training

./tools/dist_train.sh $config_path $gpus

Testing

Testing on validation set:

./tools/dist_test.sh $config_path $checkpoint_path $gpus --eval bbox

Testing on test set:

./tools/dist_test.sh $config_path $checkpoint_path $gpus --format-only --eval-options 'jsonfile_prefix=work_dirs'
mv work_dirs/pts_bbox/results_nusc.json work_dirs/pts_bbox/{$name}.json

If you have any other questions, please refer to mmdet3d docs.

Acknowledgements

We sincerely thank these excellent open-source projects:

Citation

If this work is helpful for your research, please consider citing our paper HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

@inproceedings{xia2024henet,
  title={HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras},
  author={Xia, Zhongyu and Lin, Zhiwei and Wang, Xinhao and Wang, Yongtao and Xing, Yun and Qi, Shengxiang and Dong, Nan and Yang, Ming-Hsuan},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2024}
}

License

The project is only free for academic research purposes but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.

VDIGPKU / HENet

readme