3dlg-hcvc / OPDMulti

MIT License
17 stars 3 forks source link

OPDMulti: Openable Part Detection for Multiple Objects

Xiaohao Sun*, Hanxiao Jiang*, Manolis Savva, Angel Xuan Chang

Overview

This repository contains the implementation of OPDFormer based methods for the new proposed OPDMulti task and corresponding dataset. The code is based on Detectron2 and OPD. And the OPDFormer models were built on Mask2Former.

arXiv  Website  Demo

Content

Setup

The implementation has been tested on Ubuntu 20.04, with Python 3.7, PyTorch 1.10.1, CUDA 11.1.1 and CUDNN 8.2.0.

pip install -r requirements.txt

cd opdformer/mask2former/modeling/pixel_decoder/ops python setup.py build install


## Dataset
Download our [OPDMulti](https://docs.google.com/forms/d/e/1FAIpQLSeG1Jafcy9P_OFBJ8WffYt6WJsJszXPqKIgQz0tGTYYuhm4SA/viewform?vc=0&c=0&w=1&flr=0) dataset (7.2G) and extract it inside `./dataset/` folder. Make sure the data is in [this](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#downloaded-data-organization) format.  You can follow [these](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#data-processing-procedure) steps if you want to convert your data to OPDMulti dataset. To try our model on OPDSynth and OPDReal datasets, download the data from [OPD](https://github.com/3dlg-hcvc/OPD#dataset) repository.

## Training
To train from the scratch, you can use the below commands. The output will include evaluation results on the val set.

```sh
cd opdformer
python train.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> 

Evaluation

To evaluate, use the following command:

python evaluate_on_log.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> \
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>

Pretrained-Models

You can download our pretrained model weights (on both OPDReal and OPDMulti) for different input format (RGB, RGB-D, depth) from the following table.

For model evaluation, download pretrained weights from the OPDMulti column. To finetune with custom data, use pretrained weights from OPDReal column, which are also utilized in OPDMulti results.

How to read the table

The "Model Name" column contains a link to the config file. "PSeg" is the part segmentation score, "+M" adds motion type prediction, "+MA" includes axis prediction, and "+MAO" further incorporates origin prediction.

To train/evaluate the different model variants, change the --config-file /path/to/config/name.yaml in the training/evaluation command.

OPDMulti

Model Name Input PSeg +M +MA +MAO OPDMulti Model OPDReal Model
OPDFormer-C RGB 29.1 28.0 13.5 12.3 model(169M) model(169M)
OPDFormer-O RGB 27.8 26.3 5.0 1.5 model(175M) model(175M)
OPDFormer-P RGB 31.4 30.4 18.9 15.1 model(169M) model(169M)
OPDFormer-C depth 20.9 18.9 11.4 10.1 model(169M) model(169M)
OPDFormer-O depth 23.4 21.5 5.9 1.9 model(175M) model(175M)
OPDFormer-P depth 21.7 19.8 15.4 13.5 model(169M) model(169M)
OPDFormer-C RGBD 24.2 22.7 14.1 13.4 model(169M) model(169M)
OPDFormer-O RGBD 23.1 21.2 6.7 2.6 model(175M) model(175M)
OPDFormer-P RGBD 27.4 25.5 18.1 16.7 model(169M) model(169M)

Visualization

The visualization code is based on OPD repository. We only support visualization based on raw dataset format (download link (5.0G)).

And the visualization uses the inference file, which can be obtained after the evaluation.

Citation

If you find this code useful, please consider citing:

@article{sun2023opdmulti,
  title={OPDMulti: Openable Part Detection for Multiple Objects},
  author={Sun, Xiaohao and Jiang, Hanxiao and Savva, Manolis and Chang, Angel Xuan},
  journal={arXiv preprint arXiv:2303.14087},
  year={2023}
}

@article{mao2022multiscan,
  title={MultiScan: Scalable RGBD scanning for 3D environments with articulated objects},
  author={Mao, Yongsen and Zhang, Yiming and Jiang, Hanxiao and Chang, Angel and Savva, Manolis},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={9058--9071},
  year={2022}
}

@inproceedings{jiang2022opd,
  title={OPD: Single-view 3D openable part detection},
  author={Jiang, Hanxiao and Mao, Yongsen and Savva, Manolis and Chang, Angel X},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX},
  pages={410--426},
  year={2022},
  organization={Springer}
}

@inproceedings{cheng2022masked,
  title={Masked-attention mask transformer for universal image segmentation},
  author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G and Kirillov, Alexander and Girdhar, Rohit},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={1290--1299},
  year={2022}
}

Acknowledgement

This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support from WestGrid and Compute Canada. We thank Yongsen Mao for helping us with the data processing procedure. We also thank Jiayi Liu, Sonia Raychaudhuri, Ning Wang, Yiming Zhang for feedback on paper drafts.