Xiaohao Sun*, Hanxiao Jiang*, Manolis Savva, Angel Xuan Chang
This repository contains the implementation of OPDFormer based methods for the new proposed OPDMulti task and corresponding dataset. The code is based on Detectron2 and OPD. And the OPDFormer models were built on Mask2Former.
The implementation has been tested on Ubuntu 20.04, with Python 3.7, PyTorch 1.10.1, CUDA 11.1.1 and CUDNN 8.2.0.
Clone the repository
git clone git@github.com:3dlg-hcvc/OPDMulti.git
Setup python environment to train the model
conda create -n opdmulti python=3.7
conda activate opdmulti
pip install -r requirements.txt
cd opdformer/mask2former/modeling/pixel_decoder/ops python setup.py build install
## Dataset
Download our [OPDMulti](https://docs.google.com/forms/d/e/1FAIpQLSeG1Jafcy9P_OFBJ8WffYt6WJsJszXPqKIgQz0tGTYYuhm4SA/viewform?vc=0&c=0&w=1&flr=0) dataset (7.2G) and extract it inside `./dataset/` folder. Make sure the data is in [this](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#downloaded-data-organization) format. You can follow [these](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#data-processing-procedure) steps if you want to convert your data to OPDMulti dataset. To try our model on OPDSynth and OPDReal datasets, download the data from [OPD](https://github.com/3dlg-hcvc/OPD#dataset) repository.
## Training
To train from the scratch, you can use the below commands. The output will include evaluation results on the val set.
```sh
cd opdformer
python train.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR>
<MODEL_CONFIG>
: the config file path for different model variants can be found in the table OPDMulti "Model Name" column.
Dataset:
OPDMulti/MotionDataset_h5
OPDMulti/obj_info.json
You can add the following command to use the model weights, pretrained on OPDReal dataset. We finetune this model on OPDMulti dataset:
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>
To evaluate, use the following command:
python evaluate_on_log.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> \
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL> DATASETS.TEST "('MotionNet_test',)"
.--inference-file <PATH_TO_INFERENCE_FILE>
.You can download our pretrained model weights (on both OPDReal and OPDMulti) for different input format (RGB, RGB-D, depth) from the following table.
For model evaluation, download pretrained weights from the OPDMulti column. To finetune with custom data, use pretrained weights from OPDReal column, which are also utilized in OPDMulti results.
The "Model Name" column contains a link to the config file. "PSeg" is the part segmentation score, "+M" adds motion type prediction, "+MA" includes axis prediction, and "+MAO" further incorporates origin prediction.
To train/evaluate the different model variants, change the --config-file /path/to/config/name.yaml
in the training/evaluation command.
Model Name | Input | PSeg | +M | +MA | +MAO | OPDMulti Model | OPDReal Model |
---|---|---|---|---|---|---|---|
OPDFormer-C | RGB | 29.1 | 28.0 | 13.5 | 12.3 | model(169M) | model(169M) |
OPDFormer-O | RGB | 27.8 | 26.3 | 5.0 | 1.5 | model(175M) | model(175M) |
OPDFormer-P | RGB | 31.4 | 30.4 | 18.9 | 15.1 | model(169M) | model(169M) |
OPDFormer-C | depth | 20.9 | 18.9 | 11.4 | 10.1 | model(169M) | model(169M) |
OPDFormer-O | depth | 23.4 | 21.5 | 5.9 | 1.9 | model(175M) | model(175M) |
OPDFormer-P | depth | 21.7 | 19.8 | 15.4 | 13.5 | model(169M) | model(169M) |
OPDFormer-C | RGBD | 24.2 | 22.7 | 14.1 | 13.4 | model(169M) | model(169M) |
OPDFormer-O | RGBD | 23.1 | 21.2 | 6.7 | 2.6 | model(175M) | model(175M) |
OPDFormer-P | RGBD | 27.4 | 25.5 | 18.1 | 16.7 | model(169M) | model(169M) |
The visualization code is based on OPD repository. We only support visualization based on raw dataset format (download link (5.0G)).
And the visualization uses the inference file, which can be obtained after the evaluation.
cd opdformer
python render_gt.py \
--output-dir vis_output \
--data-path <PATH_TO_DATASET> \
--valid-image <IMAGE_LIST_FILE> \
--is-real
cd opdformer
python render_pred.py \
--output-dir vis_output \
--data-path <PATH_TO_DATASET> \
--model_attr_path <PATH_TO_ATTR> \
--valid-image <IMAGE_LIST_FILE> \
--inference-file <PATH_TO_INFERENCE_FILE> \
--score-threshold 0.8 \
--update-all \
--is-real
dataset/MotionDataset
dataset/MotionDataset/valid_1000.json
If you find this code useful, please consider citing:
@article{sun2023opdmulti,
title={OPDMulti: Openable Part Detection for Multiple Objects},
author={Sun, Xiaohao and Jiang, Hanxiao and Savva, Manolis and Chang, Angel Xuan},
journal={arXiv preprint arXiv:2303.14087},
year={2023}
}
@article{mao2022multiscan,
title={MultiScan: Scalable RGBD scanning for 3D environments with articulated objects},
author={Mao, Yongsen and Zhang, Yiming and Jiang, Hanxiao and Chang, Angel and Savva, Manolis},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={9058--9071},
year={2022}
}
@inproceedings{jiang2022opd,
title={OPD: Single-view 3D openable part detection},
author={Jiang, Hanxiao and Mao, Yongsen and Savva, Manolis and Chang, Angel X},
booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX},
pages={410--426},
year={2022},
organization={Springer}
}
@inproceedings{cheng2022masked,
title={Masked-attention mask transformer for universal image segmentation},
author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G and Kirillov, Alexander and Girdhar, Rohit},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1290--1299},
year={2022}
}
This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support from WestGrid and Compute Canada. We thank Yongsen Mao for helping us with the data processing procedure. We also thank Jiayi Liu, Sonia Raychaudhuri, Ning Wang, Yiming Zhang for feedback on paper drafts.