drilistbox / FlashOCC_on_UniOcc_and_RenderOCC

22 stars 1 forks source link
FlashOcc on UniOcc and RenderOcc
FlashOcc on UniOcc ### (As our models on Autodl is emptied by ourselves, we are now devote to reproduce them) ### Nuscenes Occupancy | Config | train times | mIOU | FPS(Hz) | Flops(G) | Params(M) | Model | Log | |-----------------------------------------------------------------------------------------------------------|-------|-------|-------|-------|-------|-------------------------------------------------------------------|----------------------------------------------------------------------------------------------| | [**UniOcc-R50-256x704**](projects/configs/bevdet_occ/bevdet-occ-r50.py) | - | - | - | - | - | - | - | | [**M4:FO(UniOcc)-R50-256x704**](projects/configs/flashocc/flashocc-r50.py) | - | - | - | - | - | - | - | | [**UniOcc-R50-4D-Stereo-256x704**](configs/renderocc/renderocc-7frame-256x704.py) | - | 38.46 | - | - | - | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | | [**M5:FO(UniOcc)-R50-4D-Stereo-256x704**](configs/renderocc/renderocc-7frame-256x704-2d.py) | - | 38.76 | - | - | - | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | | [**Additional:FO(UniOcc)-R50-4D-Stereo-256x704(wo-nerfhead)**](configs/renderocc/renderocc-7frame-wonerfhead-256x704-2d.py) | - | 38.44 | - | - | - | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | [baidu](https://pan.baidu.com/s/1n9W6DhVm1m0t0kK9ZGOM4Q?pwd=3h10) | | [**UniOcc-STBase-4D-Stereo-512x1408**](projects/configs/bevdet_occ/bevdet-occ-stbase-4d-stereo-512x1408.py) | - | - | - | - | - | - | - | | [**M6:FO(UniOcc)-STBase-4D-Stereo-512x1408**](projects/configs/flashocc/flashocc-stbase-4d-stereo-512x1408.py) | - | - | - | - | - | - | - | FPS are tested via TensorRT on 3090 with FP16 precision. Please refer to Tab.2 in paper for the detail model settings for M-number. ## Acknowledgement Many thanks to these excellent open source projects: - [BEVDet](https://github.com/HuangJunJie2017/BEVDet), [DVGO](https://github.com/sunset1995/DirectVoxGO), [Occ3D](https://github.com/Tsinghua-MARS-Lab/Occ3D), [SurroundDepth](https://github.com/JeffWang987/OpenOccupancy), [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy), [CVPR2023-Occ-Challenge](https://github.com/CVPR2023-3D-Occupancy-Prediction) Related Projects: - [FlashOcc](https://github.com/Yzichen/FlashOCC), [SurroundOcc](https://github.com/weiyithu/SurroundOcc), [TPVFormer](https://github.com/wzzheng/TPVFormer), [BEVFormer](https://github.com/fundamentalvision/BEVFormer), [VoxFormer](https://github.com/NVlabs/VoxFormer), [FB-Occ](https://github.com/NVlabs/FB-BEV), [SimpleOccupancy](https://github.com/GANWANSHUI/SimpleOccupancy), [OVO](https://github.com/dzcgaara/OVO-Open-Vocabulary-Occupancy)
FlashOcc on RenderOcc
Readme from ofiginal RenderOcc # RenderOcc ### [paper](https://arxiv.org/abs/2309.09502) | [video](https://www.youtube.com/watch?v=UcdXM3FNLAc) ![demo](assets/demo.gif) (Visualization of RenderOcc's prediction, which is supervised only with 2D labels.) ## INTRODUCTION **RenderOcc** is a novel paradigm for training vision-centric 3D occupancy models **only with 2D labels**. Specifically, we extract a NeRF-style 3D volume representation from multi-view images, and employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels. ![demo](assets/method.png) ## Getting Started - [Installation](docs/install.md) - [Prepare Dataset](docs/prepare_datasets.md) - Train ``` # Train RenderOcc with 8 GPUs ./tools/dist_train.sh ./configs/renderocc/renderocc-7frame.py 8 ``` - Evaluation ``` # Eval RenderOcc with 8 GPUs ./tools/dist_test.sh ./configs/renderocc/renderocc-7frame.py ./path/to/ckpts.pth 8 ``` - Visualization ``` # TODO ``` ## Model Zoo | Method | Backbone | 2D-to-3D | Lr Schd | GT | mIoU | Config | Log | Download | |:---------:|:---------:|:---------:|:-------:|:-------:|:-----:|:-----:|:-----------------------------------------------:|:-------------------------------------------------------------------------------------------:| | RenderOcc | Swin-Base | BEVStereo | 12ep | 2D | 24.46 | [config](configs/renderocc/renderocc-7frame.py) | [log](https://github.com/pmj110119/storage/releases/download/v1/20231006_000233.log) | [model](https://github.com/pmj110119/storage/releases/download/v1/renderocc-7frame-12e.pth) | * More model weights will be released later. ## Acknowledgement Many thanks to these excellent open source projects: - [BEVDet](https://github.com/HuangJunJie2017/BEVDet), [DVGO](https://github.com/sunset1995/DirectVoxGO), [Occ3D](https://github.com/Tsinghua-MARS-Lab/Occ3D), [SurroundDepth](https://github.com/JeffWang987/OpenOccupancy), [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy), [CVPR2023-Occ-Challenge](https://github.com/CVPR2023-3D-Occupancy-Prediction) Related Projects: - [SurroundOcc](https://github.com/weiyithu/SurroundOcc), [TPVFormer](https://github.com/wzzheng/TPVFormer), [BEVFormer](https://github.com/fundamentalvision/BEVFormer), [VoxFormer](https://github.com/NVlabs/VoxFormer), [FB-Occ](https://github.com/NVlabs/FB-BEV), [SimpleOccupancy](https://github.com/GANWANSHUI/SimpleOccupancy), [OVO](https://github.com/dzcgaara/OVO-Open-Vocabulary-Occupancy) ## BibTeX If this work is helpful for your research, please consider citing: ``` @article{pan2023renderocc, title={RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision}, author={Pan, Mingjie and Liu, Jiaming and Zhang, Renrui and Huang, Peixiang and Li, Xiaoqi and Liu, Li and Zhang, Shanghang}, journal={arXiv preprint arXiv:2309.09502}, year={2023} } ```