This repo contains the official implementation of the paper "MVTrans: Multi-view Perception to See Transparent Objects".
Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB.
Setup a conda environment, install required packages, and download the repo:
conda create -y --prefix ./env python=3.8
./env/bin/python -m pip install -r requirements.txt
git clone https://github.com/ac-rad/MVTrans.git
Weights & Biases (wandb) is used to log and visualize training results. Please follow the instruction to setup wandb. To appropriately log results to cloud, insert your wandb login key in net_train_multiview.py
. Otherwise, to log results locally, run the following command and access results at localhost:
wandb offline
Our synthetic transparent object detection dataset (Syn-TODD) can be downloaded at here.
We provide pre-trained model weight for MVTrans trained on Syn-TODD dataset.
Model views | Link |
---|---|
2 views | here |
3 views | here |
5 views | here |
To train MVTrans from scratch, modify the data path and output directory in configuration files under config/
, and then run:
./runner.sh net_train_multiview.py @config/net_config_blender_multiview_{NUM_OF_VIEW}_train.txt
To run the evaluation, need to change modify the data path and output directory in configuration files under config/
, and then run:
./runner.sh net_train_multiview.py @config/net_config_blender_multiview_{NUM_OF_VIEW}_eval.txt
To run the inference, launch jupyter notebook and run inference.ipynb
.
Please cite our paper:
@misc{wang2023mvtrans,
title={MVTrans: Multi-View Perception of Transparent Objects},
author={Yi Ru Wang and Yuchi Zhao and Haoping Xu and Saggi Eppel and Alan Aspuru-Guzik and Florian Shkurti and Animesh Garg},
year={2023},
eprint={2302.11683},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
Our MVTrans architecture is built based on SimNet and ESTDepth.