The perception of autonomous vehicles has to be efficient, robust, and cost-effective. However, cameras are not robust against severe weather conditions, lidar sensors are expensive, and the performance of radar-based perception is still inferior to the others. Camera-radar fusion methods have been proposed to address this issue, but these are constrained by the typical sparsity of radar point clouds and often designed for radars without elevation information. We propose a novel camera-radar fusion approach called Dual Perspective Fusion Transformer (DPFT), designed to overcome these limitations. Our method leverages lower-level radar data (the radar cube) instead of the processed point clouds to preserve as much information as possible and employs projections in both the camera and ground planes to effectively use radars with elevation information and simplify the fusion with camera data. As a result, DPFT has demonstrated state-of-the-art performance on the K-Radar dataset while showing remarkable robustness against adverse weather conditions and maintaining a low inference time.
Model | Modality | Total | Normal | Overcast | Fog | Rain | Sleet | LightSnow | HeavySnow | Revision |
---|---|---|---|---|---|---|---|---|---|---|
DPFT | C + R | 56.1 | 55.7 | 59.4 | 63.1 | 49.0 | 51.6 | 50.5 | 50.5 | v1.0 |
DPFT | C + R | 50.5 | 51.1 | 45.2 | 64.2 | 39.9 | 42.9 | 42.4 | 51.1 | v2.0 |
This project is based on the K-Radar dataset. To set it up correctly, you should follow these two steps:
Structure the dataset accordly
We recommend using a docker based installation to ensure a consistent development environment but also provide instructions for a local installation. Therefore, check our more detailed installation instructions
docker build -t dprt:0.0.1 .
docker run \
--name dprt \
-it \
--gpus all \
-e DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v <path to repository>:/app \
-v <path to data>:/data \
dprt:0.0.1 bash
The usage of our model consists of three major steps.
First, you have to prepare the training and evaluation data by pre-processing the raw dataset. This will not only deduce the essential information from the original dataset but also reduces the data size from 16 TB to only 670 GB.
python -m dprt.prepare --src /data/kradar/raw/ --cfg /app/config/kradar.json --dst /data/kradar/processed/
python -m dprt.prepare
--src <Path to the raw dataset folder>
--cfg <Path to the configuration file>
--dst <Path to save the processed dataset>
Second, train the DPFT model on the previously prepared data or continue with a specific model training.
python -m dprt.train --src /data/kradar/processed/ --cfg /app/config/kradar.json
python -m dprt.train
--src <Path to the processed dataset folder>
--cfg <Path to the configuration file>
--dst <Path to save the training log>
--checkpoint <Path to a model checkpoint to resume training from>
Third, evaluate the model performance of a previously trained model checkpoint.
python -m dprt.evaluate --src /data/kradar/processed/ --cfg /app/config/kradar.json --checkpoint /app/log/<path to checkpoint>
python -m dprt.evaluate
--src <Path to the processed dataset folder>
--cfg <Path to the configuration file>
--dst <Path to save the evaluation log>
--checkpoint <Path to the model checkpoint to evaluate>
If DPFT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@article{fent2024dpft,
title={DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection},
author={Felix Fent and Andras Palffy and Holger Caesar},
journal={arXiv preprint arXiv:2404.03015},
year={2024}
}