Multi-Modal 3D Object Detection in Long Range and Low-Resolution Conditions of Sensors

The purpose of this codebase (master's thesis) is to investigate the impact of temporal information on the prediction accuracy of 3D objects in TUMTraf-i and OSDaR23 datasets.

Contributions:

Temporal Fusion via ConvLSTM or ConvGRU
Temporally-Aware Ground Truth Paste Data Augmentation
Temporal Pipeline & Online Caching Mechanism
Temporal Dataset Split Search Algorithm

Built on the repository of BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation.

Installation
Dataset Preparation
- TUMTraf-Intersection Dataset
- OSDAR23 Dataset
Training
Testing
Visualization
Benchmarking
Compilation

Installation

You can build the docker image manually or use the docker.sh script to build it. However, you may need to change the arguments in docker.sh for your use-case.

bash docker.sh build <dev/prod>

You can then run the container by running the following command:

bash docker.sh run <dev/prod>

You can access the container by running the following command:

bash docker.sh access <dev/prod>

If you build the dev image, you can use the following command to install the dependencies, otherwise you can skip this step:

make

Click to see additional built-in docker.sh commands

```bash bash docker.sh stop ``` ```bash bash docker.sh remove-container ``` ```bash bash docker.sh remove-image ``` ```bash bash docker.sh remove-all ```

Dataset Preparation

TUMTraf-Intersection Dataset

Click to expand

> **If you have dataset fully ready, you can skip to the 5th step.** 1 - Merge all the files into one folder, then tokenize them by running the following command (if not tokenized already): ```bash python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO ``` 2 - Add difficulty labels to the dataset by running the following command: ```bash python tools/preprocessing/tumtraf_add_difficulty_labels.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO ``` 3 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish): ```bash python tools/preprocessing/tumtraf_find_temporal_split.py --create --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i --seed 42 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-all-sequences --include-same-classes-in-difficulty --difficulty-th 1.0 --include-same-classes-in-distance --distance-th 1.0 --include-same-classes-in-num-points --num-points-th 1.0 --include-same-classes-in-occlusion --occlusion-th 0.75 --point-cloud-range -25.0 -64.0 -10.0 64.0 64.0 0.0 --splits train val test --split-ratios 0.8 0.1 0.1 --exclude-classes OTHER ``` 4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again: ```bash python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i --out-path ./data/tumtraf-i --loglevel INFO ``` 5 - Finally, you can then run the following command to create the ready-to-go version of the dataset: ```bash python tools/create_data.py tumtraf-i --root-path ./data/tumtraf-i --out-dir ./data/tumtraf-i-bevfusion --loglevel INFO ```

OSDAR23 Dataset

Click to expand

> **If you have dataset fully ready, you can skip to the 3rd step.** 1 - Put all the sequences into one folder, then create seperate lidar labels folder with additional fields by running the following command: ```bash python tools/preprocessing/osdar23_prepare.py --root-path ./data/osdar23_original --add-num-points --add-distance --loglevel INFO ``` 2 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish): ```bash python tools/preprocessing/osdar23_find_temporal_split.py --create --root-path ./data/osdar23_original --out-path ./data/osdar23 --seed 1337 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-same-classes-in-distance --distance-th 0.95 --include-same-classes-in-num-points --num-points-th 0.95 --include-same-classes-in-occlusion --occlusion-th 0.85 --point-cloud-range -6.0 -128.0 -3.0 250.0 128.0 13.0 --splits train val --split-ratios 0.8 0.2 --exclude-classes lidar__cuboid__train lidar__cuboid__buffer_stop lidar__cuboid__animal lidar__cuboid__switch lidar__cuboid__bicycle lidar__cuboid__crowd lidar__cuboid__wagons lidar__cuboid__signal_bridge ``` 4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again: ```bash python tools/preprocessing/osdar23_tokenize.py --root-path data/osdar23 --log INFO ``` 5 - Finally, you can then run the following command to create the ready-to-go version of the dataset: ```bash python tools/create_data.py osdar23 --root-path ./data/osdar23 --out-dir ./data/osdar23-bevfusion --use-highres --loglevel INFO ```

Training

LiDAR-only

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path>

Click to see examples

TUMTraf-Intersection ```bash torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/transfusion/lidar/voxelnet-1600g-0xy1-0z20-gtp15.yaml ``` OSDAR23 ```bash torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/transfusion/lidar/voxelnet-1600g-0xy16-0z4-gtp15.yaml ```

Camera-only

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth

Click to see examples

TUMTraf-Intersection ```bash torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth ``` OSDAR23 ```bash torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth ```

Multi-modal

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <lidar_checkpoint_path>

Click to see examples

TUMTraf-Intersection ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy1-0z2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OSDAR23 ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy15-0z4.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ```

Temporal

In order to train the model with temporal information, you need to load pre-trained weights of the model without temporal information.

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <pretrained_checkpoint_path>

Click to see examples

TUMTraf-Intersection - LiDAR-only ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` TUMTraf-Intersection - Multi-modal ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OsDAR23 - LiDAR-only ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OsDAR23 - Multi-modal ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ```

Testing

Following command will evaluate the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the evaluation summary and/or an extensive report of the evaluation.

torchpack dist-run -np 1 python tools/test.py <config_path> <checkpoint_path> --eval bbox

You can also use the following optional arguments by putting first:

--eval-options and then putting the following arguments
- extensive_report=True if you would like to have an extensive report of the evaluation
- save_summary_path=\ if you would like to save the evaluation summary

Click to see an example

```bash torchpack dist-run -np 1 python tools/test.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --eval bbox --eval-options extensive_report=True save_summary_path=results/run/summary.json ```

Visualization

Following command will visualize the predictions of the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the bounding boxes and/or the labels as npy files, as well as the visuals containing both predictions and ground truths.

torchpack dist-run -np 1 python tools/visualize.py <config_path> --checkpoint <checkpoint_path> --mode pred --split test --out-dir <save_path>

You can also use the following optional arguments:

--include-combined if you would like to save visuals containing both predictions and ground truths
--save-bboxes if you would like to save the bounding boxes as npy files
--save-scores if you would like to save the scores as npy files
--save-labels if you would like to save the labels as npy files
--max-samples N if you would like to visualize only a subset of the dataset, example 100
--bbox-score X if you would like to visualize only the bounding boxes with a score higher than X, example: 0.1

Click to see an example

```bash torchpack dist-run -np 1 python tools/visualize.py checkpoints/run/configs.yaml --checkpoint checkpoints/run/latest.pth --mode pred --split test --out-dir results/run/visuals --include-combined --save-bboxes --save-labels --max-samples 100 --bbox-score 0.1 ```

Benchmarking

Following command will benchmark the model on the test set. In addition, if specific arguments provided, it will also save the benchmark results in a file.

python tools/benchmark.py <config_path> <checkpoint_path>

You can also use the following optional arguments:

--out if you would like to save the benchmark results in a file

Click to see an example

```bash python tools/benchmark.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --out results/run/benchmark.json ```

Compilation

Following command will compile every other scripts such as evaluation, visualization and benchmarking scripts into one script.

python tools/compile.py <dataset> -c <checkpoints_folder_path> -i <compilation_id> -t <target_path> --include-bboxes --include-labels --images-include-combined --images-cam-bbox-score 0.15 --loglevel INFO

You can also use the following optional arguments:

--summary-foldername if you would like to change the name of the folder containing the evaluation summary, Default: summary
--images-foldername if you would like to change the name of the folder containing the images, Default: images
--videos-foldername if you would like to change the name of the folder containing the videos, Default: videos
--override-testing if you would like to override the testing results, Default: False
--override-images if you would like to override the images, Default: False
--override-videos if you would like to override the videos, Default: False
--override-benchmark if you would like to override the benchmark results, Default: False
--images-include-combined if you would like to include the visuals containing both predictions and ground truths, Default: False
--videos-include-bundled if you would like to include the bundled videos, Default: False
--images-cam-bbox-score N if you would like to visualize only the bounding boxes with a score higher than N, example: 0.1, Default: 0.0
--images-max-samples N if you would like to visualize only a subset of the dataset, example 100, Default: None
--include-bboxes if you would like to save the bounding boxes as npy files, Default: False
--include-scores if you would like to save the scores as npy files, Default: False
--include-labels if you would like to save the labels as npy files, Default: False
--skip-test if you would like to skip the testing, Default: False
--skip-images if you would like to skip the images, Default: False
--skip-videos if you would like to skip the videos, Default: False
--skip-benchmark if you would like to skip the benchmarking, Default: False

Click to see examples

TUMTraf-Intersection ```bash python tools/compile.py tumtraf-i -c checkpoints/tumtraf-i -i tumtraf-i -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO ``` OSDAR23 ```bash python tools/compile.py osdar23 -c checkpoints/osdar23 -i osdar23 -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO ```

Hyper-parameter Tuning

Current configurations are tuned for the best performance on the validation set of the respective datasets. However, you can tune the hyper-parameters for your own use-case. Sample commands are provided below.

python ./tools/gtp_tune.py <config_path> --run-dir "checkpoints/tune/tumtraf-i-ta-gtp-sampling" --n-epochs 20 --n-gpus 2 --n-trials 20 --CAR 8 15 --TRAILER 0 2 --TRUCK 0 4 --VAN 0 5 --PEDESTRIAN 0 8 --BUS 0 2 --MOTORCYCLE 0 4 --BICYCLE 0 4 --EMERGENCY_VEHICLE 0 2 --verbose --timeout 3 --enqueue 12 2 4 0 0 0 3 3 0

Then, you can use the following command to find optimal rotation and translation values for objects by first loading the best checkpoint from the first tuning and training on temporal information:

python tools/gtp_tune_temporal.py <config_path> --run-dir checkpoints/tune/tumtraf-i-ta-gtp-rt --load-from <lidar_checkpoint_path> --n-gpus 2 --n-epochs 4 --n-trials 25 --timeout 2 --verbose --CAR 0.0 2.5 0.0 0.2 --TRAILER 0.0 2.5 0.0 0.2 --TRUCK 0.0 2.5 0.0 0.2 --VAN 0.0 2.5 0.0 0.2 --PEDESTRIAN 0.0 2.5 0.0 0.3 --BUS 0.0 2.5 0.0 0.2 --MOTORCYCLE 0.0 2.5 0.0 0.25 --BICYCLE 0.0 2.5 0.0 0.25 --EMERGENCY_VEHICLE 0.0 2.5 0.0 0.2

egemenkopuz / temporal-bevfusion

readme

Multi-Modal 3D Object Detection in Long Range and Low-Resolution Conditions of Sensors

Table of Contents

Installation

Dataset Preparation

TUMTraf-Intersection Dataset

OSDAR23 Dataset

Training

LiDAR-only

Camera-only

Multi-modal

Temporal

Testing

Visualization

Benchmarking

Compilation

Hyper-parameter Tuning