MonoScene: Monocular 3D Semantic Scene Completion\
Anh-Quan Cao,
Raoul de Charette
Inria, Paris, France.
CVPR 2022 \
If you find this work or code useful, please cite our paper and give this repo a star:
@inproceedings{cao2022monoscene,
title={MonoScene: Monocular 3D Semantic Scene Completion},
author={Anh-Quan Cao and Raoul de Charette},
booktitle={CVPR},
year={2022}
}
SemanticKITTI | KITTI-360 (Trained on SemanticKITTI) |
---|---|
NYUv2
$ conda create -y -n monoscene python=3.7
$ conda activate monoscene
$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
$ cd MonoScene/
$ pip install -r requirements.txt
$ conda install -c bioconda tbb=2020.2
Downgrade torchmetrics to 0.6.0
$ pip install torchmetrics==0.6.0
Finally, install MonoScene:
$ pip install -e ./
You need to download
└── /path/to/semantic_kitti/
└── dataset
├── poses
└── sequences
Create a folder to store SemanticKITTI preprocess data at /path/to/kitti/preprocess/folder
.
Store paths in environment variables for faster access (Note: folder 'dataset' is in /path/to/semantic_kitti):
$ export KITTI_PREPROCESS=/path/to/kitti/preprocess/folder
$ export KITTI_ROOT=/path/to/semantic_kitti
$ cd MonoScene/
$ python monoscene/data/semantic_kitti/preprocess.py kitti_root=$KITTI_ROOT kitti_preprocess_root=$KITTI_PREPROCESS
Download the NYUv2 dataset.
Create a folder to store NYUv2 preprocess data at /path/to/NYU/preprocess/folder
.
Store paths in environment variables for faster access:
$ export NYU_PREPROCESS=/path/to/NYU/preprocess/folder
$ export NYU_ROOT=/path/to/NYU/depthbin
$ cd MonoScene/
$ python monoscene/data/NYU/preprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS
We only perform inference on KITTI-360. You can download either the Perspective Images for Train & Val (128G) or the Perspective Images for Test (1.5G) at http://www.cvlibs.net/datasets/kitti-360/download.php.
Create a folder to store KITTI-360 data at /path/to/KITTI-360/folder
.
Store paths in environment variables for faster access:
$ export KITTI_360_ROOT=/path/to/KITTI-360
Download MonoScene pretrained models on SemanticKITTI and on NYUv2, then put them in the folder /path/to/MonoScene/trained_models
.
To train MonoScene with SemanticKITTI, type:
Create folders to store training logs at /path/to/kitti/logdir.
Store in an environment variable:
$ export KITTI_LOG=/path/to/kitti/logdir
$ cd MonoScene/
$ python monoscene/scripts/train_monoscene.py \
dataset=kitti \
enable_log=true \
kitti_root=$KITTI_ROOT \
kitti_preprocess_root=$KITTI_PREPROCESS\
kitti_logdir=$KITTI_LOG \
n_gpus=4 batch_size=4
Create folders to store training logs at /path/to/NYU/logdir.
Store in an environment variable:
$ export NYU_LOG=/path/to/NYU/logdir
$ cd MonoScene/
$ python monoscene/scripts/train_monoscene.py \
dataset=NYU \
NYU_root=$NYU_ROOT \
NYU_preprocess_root=$NYU_PREPROCESS \
logdir=$NYU_LOG \
n_gpus=2 batch_size=4
## Evaluating
### SemanticKITTI
To evaluate MonoScene on SemanticKITTI validation set, type:
$ cd MonoScene/ $ python monoscene/scripts/eval_monoscene.py \ dataset=kitti \ kitti_root=$KITTI_ROOT \ kitti_preprocess_root=$KITTI_PREPROCESS \ n_gpus=1 batch_size=1
### NYUv2
To evaluate MonoScene on NYUv2 test set, type:
$ cd MonoScene/ $ python monoscene/scripts/eval_monoscene.py \ dataset=NYU \ NYU_root=$NYU_ROOT\ NYU_preprocess_root=$NYU_PREPROCESS \ n_gpus=1 batch_size=1
# Inference & Visualization
## Inference
Please create folder **/path/to/monoscene/output** to store the MonoScene outputs and store in environment variable:
export MONOSCENE_OUTPUT=/path/to/monoscene/output
### NYUv2
To generate the predictions on the NYUv2 test set, type:
$ cd MonoScene/ $ python monoscene/scripts/generate_output.py \ +output_path=$MONOSCENE_OUTPUT \ dataset=NYU \ NYU_root=$NYU_ROOT \ NYU_preprocess_root=$NYU_PREPROCESS \ n_gpus=1 batch_size=1
### Semantic KITTI
To generate the predictions on the Semantic KITTI validation set, type:
$ cd MonoScene/ $ python monoscene/scripts/generate_output.py \ +output_path=$MONOSCENE_OUTPUT \ dataset=kitti \ kitti_root=$KITTI_ROOT \ kitti_preprocess_root=$KITTI_PREPROCESS \ n_gpus=1 batch_size=1
### KITTI-360
Here we use the sequence **2013_05_28_drive_0009_sync**, you can use other sequences. To generate the predictions on KITTI-360, type:
$ cd MonoScene/ $ python monoscene/scripts/generate_output.py \ +output_path=$MONOSCENE_OUTPUT \ dataset=kitti_360 \ +kitti_360_root=$KITTI_360_ROOT \ +kitti_360_sequence=2013_05_28_drive_0009_sync \ n_gpus=1 batch_size=1
## Visualization
**NOTE:** if you have trouble using mayavi, you can use an alternative [visualization code using Open3D](https://github.com/astra-vision/MonoScene/issues/68#issuecomment-1637623145).
We use mayavi to visualize the predictions. Please install mayavi following the [official installation instruction](https://docs.enthought.com/mayavi/mayavi/installation.html). Then, use the following commands to visualize the outputs on respective datasets.
If you have **trouble installing mayavi**, you can take a look at our [**mayavi installation guide**](https://anhquancao.github.io/blog/2022/how-to-install-mayavi-with-python-3-on-ubuntu-2004-using-pip-or-anaconda/).
If you have **trouble fixing mayavi viewpoint**, you can take a look at [**our tutorial**](https://anhquancao.github.io/blog/2022/how-to-define-viewpoint-programmatically-in-mayavi/).
You also need to install some packages used by the visualization scripts using the commands:
pip install tqdm pip install omegaconf pip install hydra-core
### NYUv2
$ cd MonoScene/ $ python monoscene/scripts/visualization/NYU_vis_pred.py +file=/path/to/output/file.pkl
### Semantic KITTI
$ cd MonoScene/ $ python monoscene/scripts/visualization/kitti_vis_pred.py +file=/path/to/output/file.pkl +dataset=kitt
### KITTI-360
$ cd MonoScene/ $ python monoscene/scripts/visualization/kitti_vis_pred.py +file=/path/to/output/file.pkl +dataset=kitti_360
# Related camera-only 3D occupancy prediction projects
- [NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space](https://github.com/Jiawei-Yao0812/NDCScene), ICCV 2023.
- [OG: Equip vision occupancy with instance segmentation and visual grounding](https://arxiv.org/abs/2307.05873), arXiv 2023.
- [FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation](https://github.com/NVlabs/FB-BEV), CVPRW 2023.
- [Symphonize 3D Semantic Scene Completion with Contextual Instance Queries](https://github.com/hustvl/Symphonies), arXiv 2023.
- [OVO: Open-Vocabulary Occupancy](https://arxiv.org/pdf/2305.16133.pdf), arXiv 2023.
- [OccNet: Scene as Occupancy](https://github.com/opendrivelab/occnet), ICCV 2023.
- [SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields](https://astra-vision.github.io/SceneRF/), ICCV 2023.
- [Behind the Scenes: Density Fields for Single View Reconstruction](https://fwmb.github.io/bts/), CVPR 2023.
- [VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion](https://github.com/NVlabs/VoxFormer), CVPR 2023.
- [OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network](https://github.com/megvii-research/OccDepth), arXiv 2023.
- [StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion](https://github.com/Arlo0o/StereoScene), arXiv 2023.
- [Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction](https://github.com/wzzheng/TPVFormer), CVPR 2023.
- [A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving](https://github.com/GANWANSHUI/SimpleOccupancy), arXiv 2023.
- [OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction](https://github.com/zhangyp15/OccFormer), ICCV 2023.
- [SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving](https://github.com/weiyithu/SurroundOcc), ICCV 2023.
- [PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation](https://arxiv.org/abs/2306.10013), arXiv 2023.
- [PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction](https://github.com/wzzheng/PointOcc), arXiv 2023.
- [RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision](https://arxiv.org/abs/2309.09502), arXiv 2023.
## Datasets/Benchmarks
- [PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion](https://arxiv.org/abs/2309.12708), arXiv 2023.
- [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](https://github.com/JeffWang987/OpenOccupancy), ICCV 2023.
- [Occupancy Dataset for nuScenes](https://github.com/FANG-MING/occupancy-for-nuscenes), Github 2023
- [Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving](https://github.com/Tsinghua-MARS-Lab/Occ3D), arXiv 2023.
- [OccNet: Scene as Occupancy](https://github.com/opendrivelab/occnet), ICCV 2023.
- [SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving](https://github.com/ai4ce/SSCBench), arXiv 2023.
# License
MonoScene is released under the [Apache 2.0 license](./LICENSE).