FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin
* Please note that the FPS here is measured with RTX3090 TensorRT FP16.
# Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center
* Please note that the FPS here is measured with A100 GPU (PyTorch fp32 backend).
## News
- **2024.09.16** Technical Report: FlashOcc can be insert to Bevdet with 1.1ms consumption while facilitating each other.[![arXiv](https://img.shields.io/badge/arXiv-Paper-
.svg)](https://arxiv.org/abs/2409.11160)
- **2024.09.16** [Selected as reference algorithm for occupancy on horizon J6E/M](https://zhuanlan.zhihu.com/p/720461546)
- **2024.06.10** Release the code for Panoptic-FlashOCC
- **2024.04.17** Support for ray-iou metric
- **2024.03.22** Release the code for FlashOCCV2
- **2024.02.03** [Release the training code for FlashOcc on UniOcc](https://github.com/drilistbox/FlashOCC_on_UniOcc_and_RenderOCC)
- **2024.01.20** [TensorRT Implement Writen In C++ With Cuda Acceleration](https://github.com/drilistbox/TRT_FlashOcc)
- **2023.12.23** Release the quick testing code via TensorRT in MMDeploy.
- **2023.11.28** Release the training code for FlashOCC.
[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2311.12058)
[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/pdf/2406.10527)
[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2409.11160)
This repository is an official implementation of [FlashOCC](https://arxiv.org/abs/2311.12058)
and [Panoptic-FlashOCC](https://arxiv.org/pdf/2406.10527)
## Main Results
### 1. FlashOCC
| Config | Backbone | Input
Size | mIoU | FPS
(Hz) | Flops
(G) | Params
(M) | Model | Log |
|:---------------------------------------------------------------------------------------------------|:--------:|:----------:|:-------:|:-----------------------------:|:------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|
| [**BEVDetOCC (1f)**](projects/configs/bevdet_occ/bevdet-occ-r50.py) | R50 | 256x704 | 31.60 | 92.1 | [241.76](doc/mmdeploy_test.md) | [29.02](doc/mmdeploy_test.md) | [gdrive]() | [log]() |
| [**M0: FlashOCC (1f)**](projects/configs/flashocc/flashocc-r50.py) | R50 | 256x704 | 31.95 | [197.6](doc/mmdeploy_test.md) | [154.1](doc/mmdeploy_test.md) | [39.94](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/file/d/14my3jdqiIv6VIrkozQ6-ruEcBOPVlWGJ/view?usp=sharing) | [log](https://drive.google.com/file/d/1E-kaHxbTr6s3Qn70vhKpwJM8kejoNFxQ/view?usp=sharing) |
| [**M1: FlashOCC (1f)**](projects/configs/flashocc/flashocc-r50.py) | R50 | 256x704 | 32.08 | [152.7](doc/mmdeploy_test.md) | [248.57](doc/mmdeploy_test.md) | [44.74](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/file/d/1k9BzXB2nRyvXhqf7GQx3XNSej6Oq6I-B/view?usp=drive_link) | [log](https://drive.google.com/file/d/1NRm27wVZMSUylmZxsMedFSLr7729YEAV/view?usp=drive_link) |
| [**BEVDetOCC-4D-Stereo (2f)**](projects/configs/bevdet_occ/bevdet-occ-r50-4d-stereo.py) | R50 | 256x704 | 36.1 | - | - | - | [baidu](https://pan.baidu.com/s/1237QyV18zvRJ1pU3YzRItw?pwd=npe1) | [log](https://pan.baidu.com/s/1237QyV18zvRJ1pU3YzRItw?pwd=npe1) |
| [**M2:FlashOCC-4D-Stereo (2f)**](projects/configs/flashocc/flashocc-r50-4d-stereo.py) | R50 | 256x704 | 37.84 | - | - | - | [gdrive](https://drive.google.com/file/d/12WYaCdoZA8-A6_oh6vdLgOmqyEc3PNCe/view?usp=drive_link) | [log](https://drive.google.com/file/d/1eYvu9gUSQ7qk7w7lWPLrZMB0O2uKQUk3/view?usp=drive_link) |
| [**BEVDetOCC-4D-Stereo (2f)**](projects/configs/bevdet_occ/bevdet-occ-stbase-4d-stereo-512x1408.py) | Swin-B | 512x1408 | 42.0 | - | - | - | [baidu](https://pan.baidu.com/s/1237QyV18zvRJ1pU3YzRItw?pwd=npe1) | [log](https://pan.baidu.com/s/1237QyV18zvRJ1pU3YzRItw?pwd=npe1) |
| [**M3:FlashOCC-4D-Stereo (2f)**](projects/configs/flashocc/flashocc-stbase-4d-stereo-512x1408_4x4_2e-4.py) | Swin-B | 512x1408 | 43.52 | - | [1490.77](doc/mmdeploy_test.md) | [144.99](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/file/d/1f6E6Bm6enIJETSEbfXs57M0iOUU997kU/view?usp=drive_link) | [log](https://drive.google.com/file/d/1tch-YK4ROGDGNmDcN5FZnOAvsbHe-iSU/view?usp=drive_link) |
FPS are tested via TensorRT on 3090 with FP16 precision. Please refer to Tab.2 in paper for the detail model settings for M-number.
### 2. Panoptic-FlashOCC
**In Panoptic-FlashOCC, we have made the following 3 adjustments to FlashOCC**:
- Without using camera mask for training. This is because its use significantly improves the prediction performance in the visible region, but at the expense of prediction in the invisible region.
- Using category balancing.
- Using stronger loss settings.
- Introducing instance center for panoptic occupancy
**More results for different configurations will be released soon.**
| Config | Backbone | Input
Size | RayIou | RayPQ | mIoU | FPS
(Hz) | Flops
(G) | Params
(M) | Model | Log |
|:--------------------------------------------------------------------------------|:--------:|:-----------:|:-------:|:------:|:------:|:------------:|:------------------------------:|:-----------------------------:|:-----------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------:|
| [**M1: FlashOCC (1f)**](projects/configs/flashocc/flashocc-r50.py) | R50 | 256x704 | - | - | 15.41 | - | [248.57](doc/mmdeploy_test.md) | [44.74](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/file/d/14XsvjSwp-vLpy_eBZzvKo3MAh-YWRHcu/view?usp=drive_link) | [log](https://drive.google.com/file/d/1cTDoauEmjhK9fReLDcA2zPZx4a6X3U1-/view?usp=drive_link) |
| [**Panoptic-FlashOCC-Depth-tiny (1f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth-tiny.py) | R50 | 256x704 | 34.57 | - | 28.83 | 43.9 | [175.00](doc/mmdeploy_test.md) | [45.32](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-Depth-tiny-Pano (1f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth-tiny-pano.py) | R50 | 256x704 | 34.81 | 12.9 | 29.14 | 39.8 | [175.00](doc/mmdeploy_test.md) | [45.32](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-Depth (1f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth.py) | R50 | 256x704 | 34.93 | - | 28.91 | 38.7 | [269.47](doc/mmdeploy_test.md) | [50.12](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-Depth-Pano (1f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth-pano.py) | R50 | 256x704 | 35.22 | 13.2 | 29.39 | 35.2 | [269.47](doc/mmdeploy_test.md) | [50.12](doc/mmdeploy_test.md) | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-4D-Depth (2f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth4d.py) | R50 | 256x704 | 35.99 | - | 29.57 | 35.9 | - | - | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-4D-Depth-Pano (2f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth4d-pano.py) | R50 | 256x704 | 36.76 | 14.5 | 30.31 | 30.4 | - | - | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-4DLongterm-Depth (8f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth4d-longterm8f.py) | R50 | 256x704 | 38.51 | - | 31.49 | 35.6 | - | - | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
| [**Panoptic-FlashOCC-4DLongterm-Depth-Pano (8f)**](projects/configs/panoptic-flashocc/panoptic-flashocc-r50-depth4d-longterm8f.py) | R50 | 256x704 | 38.50 | 16.0 | 31.57 | 30.2 | - | - | [gdrive](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) | [log](https://drive.google.com/drive/folders/1cgCsbXgikoP10lj6DBC7Le9C-UOCIxlN?usp=sharing) |
* Please note that the FPS here is measured with A100 GPU (PyTorch fp32 backend).
## Get Started
1. [Environment Setup](doc/install.md)
2. [Model Training](doc/model_training.md)
3. [Quick Test Via TensorRT In MMDeploy](doc/mmdeploy_test.md)
| Backend | mIOU | FPS(Hz) |
|----------|-------|---------|
| PyTorch-FP32 | 31.95 | - |
| TRT-FP32 | 30.78 | 96.2 |
| TRT-FP16 | 30.78 | 197.6 |
| TRT-FP16+INT8(PTQ) | 29.60 | 383.7 |
| TRT-INT8(PTQ) | 29.59 | 397.0 |
4. [Visualization](doc/visualization.md)
* [flashocc] : A detail video can be found at [baidu](https://pan.baidu.com/s/1xfnFsj5IclpjJxIaOlI6dA?pwd=gype)
* [panoptic-flashocc] : first row is our prediction and second row is gt.
5. [TensorRT Implement Writen In C++ With Cuda Acceleration](https://github.com/drilistbox/TRT_FlashOcc)
## Acknowledgement
Many thanks to the authors of [BEVDet](https://github.com/HuangJunJie2017/BEVDet), [FB-BEV](https://github.com/NVlabs/FB-BEV.git),
[RenderOcc](https://github.com/pmj110119/RenderOcc.git) and [SparseBEV](https://github.com/MCG-NJU/SparseBEV.git)
## Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
```
@article{yu2024ultimatedo,
title={UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height},
author={Yu, Zichen and Shu, Changyong},
journal={arXiv preprint arXiv:2409.11160},
year={2024}
}
@article{yu2024panoptic,
title={Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center},
author={Yu, Zichen and Shu, Changyong and Sun, Qianpu and Linghu, Junjie and Wei, Xiaobao and Yu, Jiangyong and Liu, Zongdai and Yang, Dawei and Li, Hui and Chen, Yan},
journal={arXiv preprint arXiv:2406.10527},
year={2024}
}
@article{yu2023flashocc,
title={FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin},
author={Zichen Yu and Changyong Shu and Jiajun Deng and Kangjie Lu and Zongdai Liu and Jiangyong Yu and Dawei Yang and Hui Li and Yan Chen},
year={2023},
eprint={2311.12058},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```