OPDMulti: Openable Part Detection for Multiple Objects

Xiaohao Sun*, Hanxiao Jiang*, Manolis Savva, Angel Xuan Chang

Overview

This repository contains the implementation of OPDFormer based methods for the new proposed OPDMulti task and corresponding dataset. The code is based on Detectron2 and OPD. And the OPDFormer models were built on Mask2Former.

arXiv Website Demo

Setup

The implementation has been tested on Ubuntu 20.04, with Python 3.7, PyTorch 1.10.1, CUDA 11.1.1 and CUDNN 8.2.0.

Clone the repository

git clone git@github.com:3dlg-hcvc/OPDMulti.git

Setup python environment to train the model


conda create -n opdmulti python=3.7 
conda activate opdmulti

pip install -r requirements.txt

cd opdformer/mask2former/modeling/pixel_decoder/ops python setup.py build install


## Dataset
Download our [OPDMulti](https://docs.google.com/forms/d/e/1FAIpQLSeG1Jafcy9P_OFBJ8WffYt6WJsJszXPqKIgQz0tGTYYuhm4SA/viewform?vc=0&c=0&w=1&flr=0) dataset (7.2G) and extract it inside `./dataset/` folder. Make sure the data is in [this](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#downloaded-data-organization) format.  You can follow [these](https://github.com/3dlg-hcvc/OPDMulti/blob/master/data/README.md#data-processing-procedure) steps if you want to convert your data to OPDMulti dataset. To try our model on OPDSynth and OPDReal datasets, download the data from [OPD](https://github.com/3dlg-hcvc/OPD#dataset) repository.

## Training
To train from the scratch, you can use the below commands. The output will include evaluation results on the val set.

```sh
cd opdformer
python train.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR>

<MODEL_CONFIG>: the config file path for different model variants can be found in the table OPDMulti "Model Name" column.
Dataset:
- --data-path OPDMulti/MotionDataset_h5
- --model_attr_path: OPDMulti/obj_info.json
You can add the following command to use the model weights, pretrained on OPDReal dataset. We finetune this model on OPDMulti dataset:

--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>

Evaluation

To evaluate, use the following command:

python evaluate_on_log.py \
--config-file <MODEL_CONFIG> \
--output-dir <OUTPUT_DIR> \
--data-path <PATH_TO_DATASET> \
--input-format <RGB/depth/RGBD> \
--model_attr_path <PATH_TO_ATTR> \
--opts MODEL.WEIGHTS <PPRETRAINED_MODEL>

Evaluate on test set: --opts MODEL.WEIGHTS <PPRETRAINED_MODEL> DATASETS.TEST "('MotionNet_test',)".
To evaluate directly on pre-saved inference file, pass the file path as an argument --inference-file <PATH_TO_INFERENCE_FILE>.

Pretrained-Models

You can download our pretrained model weights (on both OPDReal and OPDMulti) for different input format (RGB, RGB-D, depth) from the following table.

For model evaluation, download pretrained weights from the OPDMulti column. To finetune with custom data, use pretrained weights from OPDReal column, which are also utilized in OPDMulti results.

How to read the table

The "Model Name" column contains a link to the config file. "PSeg" is the part segmentation score, "+M" adds motion type prediction, "+MA" includes axis prediction, and "+MAO" further incorporates origin prediction.

To train/evaluate the different model variants, change the --config-file /path/to/config/name.yaml in the training/evaluation command.

OPDMulti

Model Name	Input	PSeg	+M	+MA	+MAO	OPDMulti Model	OPDReal Model
OPDFormer-C	RGB	29.1	28.0	13.5	12.3	model(169M)	model(169M)
OPDFormer-O	RGB	27.8	26.3	5.0	1.5	model(175M)	model(175M)
OPDFormer-P	RGB	31.4	30.4	18.9	15.1	model(169M)	model(169M)
OPDFormer-C	depth	20.9	18.9	11.4	10.1	model(169M)	model(169M)
OPDFormer-O	depth	23.4	21.5	5.9	1.9	model(175M)	model(175M)
OPDFormer-P	depth	21.7	19.8	15.4	13.5	model(169M)	model(169M)
OPDFormer-C	RGBD	24.2	22.7	14.1	13.4	model(169M)	model(169M)
OPDFormer-O	RGBD	23.1	21.2	6.7	2.6	model(175M)	model(175M)
OPDFormer-P	RGBD	27.4	25.5	18.1	16.7	model(169M)	model(169M)

Visualization

The visualization code is based on OPD repository. We only support visualization based on raw dataset format (download link (5.0G)).

And the visualization uses the inference file, which can be obtained after the evaluation.

Visualize the GT with 1000 random images in val set

cd opdformer
python render_gt.py \
--output-dir vis_output \
--data-path <PATH_TO_DATASET> \
--valid-image <IMAGE_LIST_FILE> \
--is-real

Visualize the PREDICTION with 1000 random images in val set

cd opdformer
python render_pred.py \
--output-dir vis_output \
--data-path <PATH_TO_DATASET> \
--model_attr_path <PATH_TO_ATTR> \
--valid-image <IMAGE_LIST_FILE> \
--inference-file <PATH_TO_INFERENCE_FILE> \
--score-threshold 0.8 \
--update-all \
--is-real

--data-path dataset/MotionDataset
--valid_image dataset/MotionDataset/valid_1000.json

Citation

If you find this code useful, please consider citing:

@article{sun2023opdmulti,
  title={OPDMulti: Openable Part Detection for Multiple Objects},
  author={Sun, Xiaohao and Jiang, Hanxiao and Savva, Manolis and Chang, Angel Xuan},
  journal={arXiv preprint arXiv:2303.14087},
  year={2023}
}

@article{mao2022multiscan,
  title={MultiScan: Scalable RGBD scanning for 3D environments with articulated objects},
  author={Mao, Yongsen and Zhang, Yiming and Jiang, Hanxiao and Chang, Angel and Savva, Manolis},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={9058--9071},
  year={2022}
}

@inproceedings{jiang2022opd,
  title={OPD: Single-view 3D openable part detection},
  author={Jiang, Hanxiao and Mao, Yongsen and Savva, Manolis and Chang, Angel X},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXIX},
  pages={410--426},
  year={2022},
  organization={Springer}
}

@inproceedings{cheng2022masked,
  title={Masked-attention mask transformer for universal image segmentation},
  author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G and Kirillov, Alexander and Girdhar, Rohit},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={1290--1299},
  year={2022}
}

Acknowledgement

This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support from WestGrid and Compute Canada. We thank Yongsen Mao for helping us with the data processing procedure. We also thank Jiayi Liu, Sonia Raychaudhuri, Ning Wang, Yiming Zhang for feedback on paper drafts.

3dlg-hcvc / OPDMulti

readme