Multi-Modal 3D Object Detection in Long Range and Low-Resolution Conditions of Sensors
The purpose of this codebase (master's thesis) is to investigate the impact of temporal information on the prediction accuracy of 3D objects in TUMTraf-i and OSDaR23 datasets.
Contributions:
- Temporal Fusion via ConvLSTM or ConvGRU
- Temporally-Aware Ground Truth Paste Data Augmentation
- Temporal Pipeline & Online Caching Mechanism
- Temporal Dataset Split Search Algorithm
Built on the repository of BEVFusion: Multi-Task Multi-Sensor Fusion with
Unified Bird's-Eye View Representation.
Table of Contents
Installation
You can build the docker image manually or use the docker.sh script to build it. However, you may need to change the arguments in docker.sh for your use-case.
bash docker.sh build <dev/prod>
You can then run the container by running the following command:
bash docker.sh run <dev/prod>
You can access the container by running the following command:
bash docker.sh access <dev/prod>
If you build the dev image, you can use the following command to install the dependencies, otherwise you can skip this step:
make
Click to see additional built-in docker.sh commands
```bash
bash docker.sh stop
```
```bash
bash docker.sh remove-container
```
```bash
bash docker.sh remove-image
```
```bash
bash docker.sh remove-all
```
Dataset Preparation
TUMTraf-Intersection Dataset
Click to expand
> **If you have dataset fully ready, you can skip to the 5th step.**
1 - Merge all the files into one folder, then tokenize them by running the following command (if not tokenized already):
```bash
python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO
```
2 - Add difficulty labels to the dataset by running the following command:
```bash
python tools/preprocessing/tumtraf_add_difficulty_labels.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO
```
3 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish):
```bash
python tools/preprocessing/tumtraf_find_temporal_split.py --create --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i --seed 42 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-all-sequences --include-same-classes-in-difficulty --difficulty-th 1.0 --include-same-classes-in-distance --distance-th 1.0 --include-same-classes-in-num-points --num-points-th 1.0 --include-same-classes-in-occlusion --occlusion-th 0.75 --point-cloud-range -25.0 -64.0 -10.0 64.0 64.0 0.0 --splits train val test --split-ratios 0.8 0.1 0.1 --exclude-classes OTHER
```
4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again:
```bash
python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i --out-path ./data/tumtraf-i --loglevel INFO
```
5 - Finally, you can then run the following command to create the ready-to-go version of the dataset:
```bash
python tools/create_data.py tumtraf-i --root-path ./data/tumtraf-i --out-dir ./data/tumtraf-i-bevfusion --loglevel INFO
```
OSDAR23 Dataset
Click to expand
> **If you have dataset fully ready, you can skip to the 3rd step.**
1 - Put all the sequences into one folder, then create seperate lidar labels folder with additional fields by running the following command:
```bash
python tools/preprocessing/osdar23_prepare.py --root-path ./data/osdar23_original --add-num-points --add-distance --loglevel INFO
```
2 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish):
```bash
python tools/preprocessing/osdar23_find_temporal_split.py --create --root-path ./data/osdar23_original --out-path ./data/osdar23 --seed 1337 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-same-classes-in-distance --distance-th 0.95 --include-same-classes-in-num-points --num-points-th 0.95 --include-same-classes-in-occlusion --occlusion-th 0.85 --point-cloud-range -6.0 -128.0 -3.0 250.0 128.0 13.0 --splits train val --split-ratios 0.8 0.2 --exclude-classes lidar__cuboid__train lidar__cuboid__buffer_stop lidar__cuboid__animal lidar__cuboid__switch lidar__cuboid__bicycle lidar__cuboid__crowd lidar__cuboid__wagons lidar__cuboid__signal_bridge
```
4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again:
```bash
python tools/preprocessing/osdar23_tokenize.py --root-path data/osdar23 --log INFO
```
5 - Finally, you can then run the following command to create the ready-to-go version of the dataset:
```bash
python tools/create_data.py osdar23 --root-path ./data/osdar23 --out-dir ./data/osdar23-bevfusion --use-highres --loglevel INFO
```
Training
LiDAR-only
torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path>
Click to see examples
TUMTraf-Intersection
```bash
torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/transfusion/lidar/voxelnet-1600g-0xy1-0z20-gtp15.yaml
```
OSDAR23
```bash
torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/transfusion/lidar/voxelnet-1600g-0xy16-0z4-gtp15.yaml
```
Camera-only
torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
Click to see examples
TUMTraf-Intersection
```bash
torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
```
OSDAR23
```bash
torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
```
Multi-modal
torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <lidar_checkpoint_path>
Click to see examples
TUMTraf-Intersection
```bash
torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy1-0z2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
OSDAR23
```bash
torchpack dist-run -np 2 python tools/train.py configs/osdar23/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy15-0z4.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
Temporal
In order to train the model with temporal information, you need to load pre-trained weights of the model without temporal information.
torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <pretrained_checkpoint_path>
Click to see examples
TUMTraf-Intersection - LiDAR-only
```bash
torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
TUMTraf-Intersection - Multi-modal
```bash
torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
OsDAR23 - LiDAR-only
```bash
torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
OsDAR23 - Multi-modal
```bash
torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from
```
Testing
Following command will evaluate the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the evaluation summary and/or an extensive report of the evaluation.
torchpack dist-run -np 1 python tools/test.py <config_path> <checkpoint_path> --eval bbox
You can also use the following optional arguments by putting first:
- --eval-options and then putting the following arguments
- extensive_report=True if you would like to have an extensive report of the evaluation
- save_summary_path=\ if you would like to save the evaluation summary
Click to see an example
```bash
torchpack dist-run -np 1 python tools/test.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --eval bbox --eval-options extensive_report=True save_summary_path=results/run/summary.json
```
Visualization
Following command will visualize the predictions of the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the bounding boxes and/or the labels as npy files, as well as the visuals containing both predictions and ground truths.
torchpack dist-run -np 1 python tools/visualize.py <config_path> --checkpoint <checkpoint_path> --mode pred --split test --out-dir <save_path>
You can also use the following optional arguments:
- --include-combined if you would like to save visuals containing both predictions and ground truths
- --save-bboxes if you would like to save the bounding boxes as npy files
- --save-scores if you would like to save the scores as npy files
- --save-labels if you would like to save the labels as npy files
- --max-samples N if you would like to visualize only a subset of the dataset, example 100
- --bbox-score X if you would like to visualize only the bounding boxes with a score higher than X, example: 0.1
Click to see an example
```bash
torchpack dist-run -np 1 python tools/visualize.py checkpoints/run/configs.yaml --checkpoint checkpoints/run/latest.pth --mode pred --split test --out-dir results/run/visuals --include-combined --save-bboxes --save-labels --max-samples 100 --bbox-score 0.1
```
Benchmarking
Following command will benchmark the model on the test set. In addition, if specific arguments provided, it will also save the benchmark results in a file.
python tools/benchmark.py <config_path> <checkpoint_path>
You can also use the following optional arguments:
- --out if you would like to save the benchmark results in a file
Click to see an example
```bash
python tools/benchmark.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --out results/run/benchmark.json
```
Compilation
Following command will compile every other scripts such as evaluation, visualization and benchmarking scripts into one script.
python tools/compile.py <dataset> -c <checkpoints_folder_path> -i <compilation_id> -t <target_path> --include-bboxes --include-labels --images-include-combined --images-cam-bbox-score 0.15 --loglevel INFO
You can also use the following optional arguments:
- --summary-foldername if you would like to change the name of the folder containing the evaluation summary, Default: summary
- --images-foldername if you would like to change the name of the folder containing the images, Default: images
- --videos-foldername if you would like to change the name of the folder containing the videos, Default: videos
- --override-testing if you would like to override the testing results, Default: False
- --override-images if you would like to override the images, Default: False
- --override-videos if you would like to override the videos, Default: False
- --override-benchmark if you would like to override the benchmark results, Default: False
- --images-include-combined if you would like to include the visuals containing both predictions and ground truths, Default: False
- --videos-include-bundled if you would like to include the bundled videos, Default: False
- --images-cam-bbox-score N if you would like to visualize only the bounding boxes with a score higher than N, example: 0.1, Default: 0.0
- --images-max-samples N if you would like to visualize only a subset of the dataset, example 100, Default: None
- --include-bboxes if you would like to save the bounding boxes as npy files, Default: False
- --include-scores if you would like to save the scores as npy files, Default: False
- --include-labels if you would like to save the labels as npy files, Default: False
- --skip-test if you would like to skip the testing, Default: False
- --skip-images if you would like to skip the images, Default: False
- --skip-videos if you would like to skip the videos, Default: False
- --skip-benchmark if you would like to skip the benchmarking, Default: False
Click to see examples
TUMTraf-Intersection
```bash
python tools/compile.py tumtraf-i -c checkpoints/tumtraf-i -i tumtraf-i -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO
```
OSDAR23
```bash
python tools/compile.py osdar23 -c checkpoints/osdar23 -i osdar23 -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO
```
Hyper-parameter Tuning
Current configurations are tuned for the best performance on the validation set of the respective datasets. However, you can tune the hyper-parameters for your own use-case. Sample commands are provided below.
python ./tools/gtp_tune.py <config_path> --run-dir "checkpoints/tune/tumtraf-i-ta-gtp-sampling" --n-epochs 20 --n-gpus 2 --n-trials 20 --CAR 8 15 --TRAILER 0 2 --TRUCK 0 4 --VAN 0 5 --PEDESTRIAN 0 8 --BUS 0 2 --MOTORCYCLE 0 4 --BICYCLE 0 4 --EMERGENCY_VEHICLE 0 2 --verbose --timeout 3 --enqueue 12 2 4 0 0 0 3 3 0
Then, you can use the following command to find optimal rotation and translation values for objects by first loading the best checkpoint from the first tuning and training on temporal information:
python tools/gtp_tune_temporal.py <config_path> --run-dir checkpoints/tune/tumtraf-i-ta-gtp-rt --load-from <lidar_checkpoint_path> --n-gpus 2 --n-epochs 4 --n-trials 25 --timeout 2 --verbose --CAR 0.0 2.5 0.0 0.2 --TRAILER 0.0 2.5 0.0 0.2 --TRUCK 0.0 2.5 0.0 0.2 --VAN 0.0 2.5 0.0 0.2 --PEDESTRIAN 0.0 2.5 0.0 0.3 --BUS 0.0 2.5 0.0 0.2 --MOTORCYCLE 0.0 2.5 0.0 0.25 --BICYCLE 0.0 2.5 0.0 0.25 --EMERGENCY_VEHICLE 0.0 2.5 0.0 0.2