drprojects / DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Other
228 stars 25 forks source link
attention cvpr cvpr2022 deep-learning image kitti-360 multi-view multimodal multimodal-deep-learning point-cloud point-cloud-segmentation pytorch pytorch-geometric s3dis semantic-segmentation torch-points3d
# DeepViewAgg [![python](https://img.shields.io/badge/-Python_3.7.9-blue?logo=python&logoColor=white)](https://www.python.org/downloads/release/python-379/) [![pytorch](https://img.shields.io/badge/PyTorch_1.7.1-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/locally/) [![license](https://img.shields.io/badge/License-BSD+MIT-green.svg?labelColor=gray)](https://github.com/drprojects/DeepViewAgg/blob/release/LICENSE.md) Official implementation for

[_Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation_](https://arxiv.org/abs/2204.07548)
([CVPR'22 Best Paper Finalist 🎉](https://twitter.com/CVPR/status/1539772091112857600))
[![arXiv](https://img.shields.io/badge/arxiv-2204.07548-b31b1b.svg)](https://arxiv.org/abs/2204.07548) [![Project page](https://img.shields.io/badge/Project_page-8A2BE2)](https://drprojects.github.io/deepviewagg) [![Video](https://img.shields.io/badge/Video-FFC300)](https://www.youtube.com/watch?v=SoMKwI863tw) [![Poster](https://img.shields.io/badge/Poster-e76f51)](https://drive.google.com/file/d/1vtOLLM4VNV5x57HT-60PbeR9QRiOfX7_/view?usp=sharing) [![CV News](https://img.shields.io/badge/CV_News-6a994e)](https://www.rsipvision.com/ComputerVisionNews-2022July/24)

**If you ❤️ or simply use this project, don't forget to give the repository a ⭐, it means a lot to us !**
@article{robert2022dva,
  title={Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation},
  author={Robert, Damien and Vallet, Bruno and Landrieu, Loic},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}


📌 Description

We propose to exploit the synergy between images 🖼️ and 3D point clouds ☁️ by learning to select the most relevant views for each point. Our approach uses the viewing conditions 👀 of 3D points to merge features from images taken at arbitrary positions. We reach SOTA results for S3DIS (74.7 mIoU 6-Fold) and on KITTI- 360 (58.3 mIoU) without requiring point colorization, meshing, or the use of depth cameras: our full pipeline only requires raw, large-scale 3D point clouds and a set of images and poses.

| ✨ DeepViewAgg in short ✨ | |:------------------------------------------------------------------------------------:| | 🤖 Learns **2D+3D features** end-to-end | | 👀 **Attentive multi-view aggregation** from **viewing conditions** | | 🚫 No need for 3D colorization, meshing, depth sensor, synthetic views, or 2D labels | | ✅ Only needs **raw point clouds, images, and poses** | [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-multi-view-aggregation-in-the-wild/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=learning-multi-view-aggregation-in-the-wild) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-multi-view-aggregation-in-the-wild/3d-semantic-segmentation-on-kitti-360)](https://paperswithcode.com/sota/3d-semantic-segmentation-on-kitti-360?p=learning-multi-view-aggregation-in-the-wild)


📰 Change log


📝 Requirements

The following must be installed before installing this project.

All remaining dependencies (PyTorch, PyTorch Geometric, etc.) should be installed using the provided installation script.

The code has been tested in the following environment:


🏗️ Installation

To install DeepViewAgg, simply run ./install.sh from inside the repository.


Disclaimer

This is not the official Torch-Points3D framework. This work builds on and modifies a fixed version of the framework and has not been merged with the official repository yet. In particular, this repository introduces numerous features for multimodal learning on large-scale 3D point clouds. In this repository, some TP3D-specific files were removed for simplicity.


🔩 Project structure

The project follows the original Torch-Points3D framework structure.

├─ conf                    # All configurations live there
├─ notebooks               # Notebooks to get started with multimodal datasets and models
├─ eval.py                 # Eval script
├─ insall.sh               # Installation script for DeepViewAgg
├─ scripts                 # Some scripts to help manage the project
├─ torch_points3d
    ├─ core                # Core components
    ├─ datasets            # All code related to datasets
    ├─ metrics             # All metrics and trackers
    ├─ models              # All models
    ├─ modules             # Basic modules that can be used in a modular way
    ├─ utils               # Various utils
    └─ visualization       # Visualization
└─ train.py                # Main script to launch a training

Several changes were made to extend the original project to multimodal learning on point clouds with images. The most important ones can be found in the following:


🚀 Getting started

Notebook to create synthetic toy dataset and get familiar with 2D-3D mappings construction :

Notebooks to create dataset, get familiar with dataset configuration and produce interactive visualization. You can also run inference from a checkpoint and visualize predictions:

Notebooks to create multimodal models, get familiar with model configuration and run forward and backward passes for debugging:

Notebooks to run full inference on multimodal datasets, from a model checkpoint. Those should allow you to reproduce our results by using the pretrained models in Models:

Scripts to replicate our paper's best experiments 📈 for each dataset:

If you need to go deeper into this project, see the Documentation section.

If you have trouble using these or need reproduce other results from our paper, create an issue or leave me a message 💬 !


🤖 Models

Model name Dataset mIoU 💾 👇
Res16UNet34-L4-early S3DIS 6-Fold 74.7 2.0G link
Res16UNet34-PointPyramid-early-cityscapes-interpolate KITTI-360 61.7 Val / 58.3 Test 339M link
Res16UNet34-L4-early ScanNet 71.0 Val 341M link


📚 Documentation

The official documentation of Pytorch Geometric and Torch-Points3D are good starting points, since this project largely builds on top of these frameworks. For DeepViewAgg-specific features (i.e. all that concerns multimodal learning), the provided code is commented as much as possible, but hit me up 💬 if some parts need clarification.


🔭 Visualization of multimodal data

We provide code to produce interactive and sharable HTML visualizations of multimodal data and point-image mappings:

Examples of such HTML produced on S3DIS Fold 5 are zipped here and can be opened in your browser.


👩‍🔧 Troubleshooting & known issues


💳 Credits


Citing our work

In case you use all or part of the present code, please include the following citation:

@article{robert2022dva,
  title={Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation},
  author={Robert, Damien and Vallet, Bruno and Landrieu, Loic},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

You can find our DeepViewAgg paper 📄 on arxiv.

Also, if you ❤️ or simply use this project, don't forget to give the repository a ⭐, it means a lot to us !