egemenkopuz / temporal-bevfusion

Master's thesis research on 3D object detection using LiDAR and Camera data for infrastructure and railway domains, emphasizing inference optimization and utilization of temporal information for distant and occluded objects.
11 stars 1 forks source link

Multi-Modal 3D Object Detection in Long Range and Low-Resolution Conditions of Sensors

Teaser

Python 3.8 PyTorch 1.10.1 Black

The purpose of this codebase (master's thesis) is to investigate the impact of temporal information on the prediction accuracy of 3D objects in TUMTraf-i and OSDaR23 datasets.

Contributions:

Built on the repository of BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation.

Table of Contents

Installation

You can build the docker image manually or use the docker.sh script to build it. However, you may need to change the arguments in docker.sh for your use-case.

bash docker.sh build <dev/prod>

You can then run the container by running the following command:

bash docker.sh run <dev/prod>

You can access the container by running the following command:

bash docker.sh access <dev/prod>

If you build the dev image, you can use the following command to install the dependencies, otherwise you can skip this step:

make
Click to see additional built-in docker.sh commands
```bash bash docker.sh stop ``` ```bash bash docker.sh remove-container ``` ```bash bash docker.sh remove-image ``` ```bash bash docker.sh remove-all ```

Dataset Preparation

TUMTraf-Intersection Dataset

Click to expand
> **If you have dataset fully ready, you can skip to the 5th step.** 1 - Merge all the files into one folder, then tokenize them by running the following command (if not tokenized already): ```bash python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO ``` 2 - Add difficulty labels to the dataset by running the following command: ```bash python tools/preprocessing/tumtraf_add_difficulty_labels.py --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i-no-split --loglevel INFO ``` 3 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish): ```bash python tools/preprocessing/tumtraf_find_temporal_split.py --create --root-path ./data/tumtraf-i-no-split --out-path ./data/tumtraf-i --seed 42 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-all-sequences --include-same-classes-in-difficulty --difficulty-th 1.0 --include-same-classes-in-distance --distance-th 1.0 --include-same-classes-in-num-points --num-points-th 1.0 --include-same-classes-in-occlusion --occlusion-th 0.75 --point-cloud-range -25.0 -64.0 -10.0 64.0 64.0 0.0 --splits train val test --split-ratios 0.8 0.1 0.1 --exclude-classes OTHER ``` 4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again: ```bash python tools/preprocessing/tumtraf_tokenize.py --root-path ./data/tumtraf-i --out-path ./data/tumtraf-i --loglevel INFO ``` 5 - Finally, you can then run the following command to create the ready-to-go version of the dataset: ```bash python tools/create_data.py tumtraf-i --root-path ./data/tumtraf-i --out-dir ./data/tumtraf-i-bevfusion --loglevel INFO ```

OSDAR23 Dataset

Click to expand
> **If you have dataset fully ready, you can skip to the 3rd step.** 1 - Put all the sequences into one folder, then create seperate lidar labels folder with additional fields by running the following command: ```bash python tools/preprocessing/osdar23_prepare.py --root-path ./data/osdar23_original --add-num-points --add-distance --loglevel INFO ``` 2 - You can then run the following command to find the optimally balanced split and split the dataset into training, validation and test sets (reduce the 'perm-limit' or increase the 'p' if it is taking too long to finish): ```bash python tools/preprocessing/osdar23_find_temporal_split.py --create --root-path ./data/osdar23_original --out-path ./data/osdar23 --seed 1337 --segment-size 30 --perm-limit 60000 --loglevel INFO -p 6 --include-all-classes --include-same-classes-in-distance --distance-th 0.95 --include-same-classes-in-num-points --num-points-th 0.95 --include-same-classes-in-occlusion --occlusion-th 0.85 --point-cloud-range -6.0 -128.0 -3.0 250.0 128.0 13.0 --splits train val --split-ratios 0.8 0.2 --exclude-classes lidar__cuboid__train lidar__cuboid__buffer_stop lidar__cuboid__animal lidar__cuboid__switch lidar__cuboid__bicycle lidar__cuboid__crowd lidar__cuboid__wagons lidar__cuboid__signal_bridge ``` 4 - In order to make new seperate sequence segments into to their own pseudo sequences, run the following command to tokenize the dataset again: ```bash python tools/preprocessing/osdar23_tokenize.py --root-path data/osdar23 --log INFO ``` 5 - Finally, you can then run the following command to create the ready-to-go version of the dataset: ```bash python tools/create_data.py osdar23 --root-path ./data/osdar23 --out-dir ./data/osdar23-bevfusion --use-highres --loglevel INFO ```

Training

LiDAR-only

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path>
Click to see examples
TUMTraf-Intersection ```bash torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/transfusion/lidar/voxelnet-1600g-0xy1-0z20-gtp15.yaml ``` OSDAR23 ```bash torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/transfusion/lidar/voxelnet-1600g-0xy16-0z4-gtp15.yaml ```

Camera-only

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
Click to see examples
TUMTraf-Intersection ```bash torchpack dist-run -np 1 python tools/train.py configs/tumtraf-i/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth ``` OSDAR23 ```bash torchpack dist-run -np 1 python tools/train.py configs/osdar23/baseline/centerhead/camera/swint-depthlss-256x704-1600g-0xy1-0z2-gm17-0p6.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth ```

Multi-modal

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <lidar_checkpoint_path>
Click to see examples
TUMTraf-Intersection ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy1-0z2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OSDAR23 ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/baseline/transfusion/fusion/convfuser-256x704-1600g-0xy15-0z4.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ```

Temporal

In order to train the model with temporal information, you need to load pre-trained weights of the model without temporal information.

torchpack dist-run -np <number_of_gpus> python tools/train.py <config_path> --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from <pretrained_checkpoint_path>
Click to see examples
TUMTraf-Intersection - LiDAR-only ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` TUMTraf-Intersection - Multi-modal ```bash torchpack dist-run -np 2 python tools/train.py configs/tumtraf-i/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OsDAR23 - LiDAR-only ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/lidar/voxelnet-convlstm-1600g-0xy1-0z20-sameaugall-ql3-qrt1-gtp3-sameaug-rpd0p5-trans-rot-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ``` OsDAR23 - Multi-modal ```bash torchpack dist-run -np 2 python tools/train.py configs/osdar23/temporal/transfusion/fusion/concatfuser-256x704-1600g-0xy1-0z2-lfrz.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from ```

Testing

Following command will evaluate the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the evaluation summary and/or an extensive report of the evaluation.

torchpack dist-run -np 1 python tools/test.py <config_path> <checkpoint_path> --eval bbox

You can also use the following optional arguments by putting first:

Click to see an example
```bash torchpack dist-run -np 1 python tools/test.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --eval bbox --eval-options extensive_report=True save_summary_path=results/run/summary.json ```

Visualization

Following command will visualize the predictions of the model on the test set and save the results in the designated folder. In addition, if specific arguments provided, it will also save the bounding boxes and/or the labels as npy files, as well as the visuals containing both predictions and ground truths.

torchpack dist-run -np 1 python tools/visualize.py <config_path> --checkpoint <checkpoint_path> --mode pred --split test --out-dir <save_path>

You can also use the following optional arguments:

Click to see an example
```bash torchpack dist-run -np 1 python tools/visualize.py checkpoints/run/configs.yaml --checkpoint checkpoints/run/latest.pth --mode pred --split test --out-dir results/run/visuals --include-combined --save-bboxes --save-labels --max-samples 100 --bbox-score 0.1 ```

Benchmarking

Following command will benchmark the model on the test set. In addition, if specific arguments provided, it will also save the benchmark results in a file.

python tools/benchmark.py <config_path> <checkpoint_path>

You can also use the following optional arguments:

Click to see an example
```bash python tools/benchmark.py checkpoints/run/configs.yaml checkpoints/run/latest.pth --out results/run/benchmark.json ```

Compilation

Following command will compile every other scripts such as evaluation, visualization and benchmarking scripts into one script.

python tools/compile.py <dataset> -c <checkpoints_folder_path> -i <compilation_id> -t <target_path> --include-bboxes --include-labels --images-include-combined --images-cam-bbox-score 0.15 --loglevel INFO

You can also use the following optional arguments:

Click to see examples
TUMTraf-Intersection ```bash python tools/compile.py tumtraf-i -c checkpoints/tumtraf-i -i tumtraf-i -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO ``` OSDAR23 ```bash python tools/compile.py osdar23 -c checkpoints/osdar23 -i osdar23 -t results --include-bboxes --include-scores --include-labels --images-include-combined --images-cam-bbox-score 0.15 --videos-include-bundled --loglevel INFO ```

Hyper-parameter Tuning

Current configurations are tuned for the best performance on the validation set of the respective datasets. However, you can tune the hyper-parameters for your own use-case. Sample commands are provided below.

python ./tools/gtp_tune.py <config_path> --run-dir "checkpoints/tune/tumtraf-i-ta-gtp-sampling" --n-epochs 20 --n-gpus 2 --n-trials 20 --CAR 8 15 --TRAILER 0 2 --TRUCK 0 4 --VAN 0 5 --PEDESTRIAN 0 8 --BUS 0 2 --MOTORCYCLE 0 4 --BICYCLE 0 4 --EMERGENCY_VEHICLE 0 2 --verbose --timeout 3 --enqueue 12 2 4 0 0 0 3 3 0

Then, you can use the following command to find optimal rotation and translation values for objects by first loading the best checkpoint from the first tuning and training on temporal information:

python tools/gtp_tune_temporal.py <config_path> --run-dir checkpoints/tune/tumtraf-i-ta-gtp-rt --load-from <lidar_checkpoint_path> --n-gpus 2 --n-epochs 4 --n-trials 25 --timeout 2 --verbose --CAR 0.0 2.5 0.0 0.2 --TRAILER 0.0 2.5 0.0 0.2 --TRUCK 0.0 2.5 0.0 0.2 --VAN 0.0 2.5 0.0 0.2 --PEDESTRIAN 0.0 2.5 0.0 0.3 --BUS 0.0 2.5 0.0 0.2 --MOTORCYCLE 0.0 2.5 0.0 0.25 --BICYCLE 0.0 2.5 0.0 0.25 --EMERGENCY_VEHICLE 0.0 2.5 0.0 0.2