ac-rad / MVTrans

MVTrans: Multi-view Perception to See Transparent Objects (ICRA2023)
15 stars 3 forks source link

MVTrans: Multi-view Perception to See Transparent Objects (ICRA2023)

Paper | Project | Video

This repo contains the official implementation of the paper "MVTrans: Multi-view Perception to See Transparent Objects".


Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB.


Setup a conda environment, install required packages, and download the repo:

conda create -y --prefix ./env python=3.8
./env/bin/python -m pip install -r requirements.txt
git clone

Weights & Biases (wandb) is used to log and visualize training results. Please follow the instruction to setup wandb. To appropriately log results to cloud, insert your wandb login key in Otherwise, to log results locally, run the following command and access results at localhost:

wandb offline


Our synthetic transparent object detection dataset (Syn-TODD) can be downloaded at here.

Pre-trained Model

We provide pre-trained model weight for MVTrans trained on Syn-TODD dataset.

Model views Link
2 views here
3 views here
5 views here


To train MVTrans from scratch, modify the data path and output directory in configuration files under config/, and then run:

./ @config/net_config_blender_multiview_{NUM_OF_VIEW}_train.txt


To run the evaluation, need to change modify the data path and output directory in configuration files under config/, and then run:

./ @config/net_config_blender_multiview_{NUM_OF_VIEW}_eval.txt


To run the inference, launch jupyter notebook and run inference.ipynb.


Please cite our paper:

      title={MVTrans: Multi-View Perception of Transparent Objects}, 
      author={Yi Ru Wang and Yuchi Zhao and Haoping Xu and Saggi Eppel and Alan Aspuru-Guzik and Florian Shkurti and Animesh Garg},


Our MVTrans architecture is built based on SimNet and ESTDepth.