Refer-it-in-RGBD
![](https://github.com/HaolinLiu97/Refer-it-in-RGBD/raw/main/docs/teaser.png)
![](https://github.com/HaolinLiu97/Refer-it-in-RGBD/raw/main/docs/projectpage.gif)
This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021
Paper -
ArXiv - pdf (
abs)
Project page: https://haolinliu97.github.io/Refer-it-in-RGBD
### Introduction
We present a novel task of 3D visual grounding in
single-view RGB-D images where the referred objects are often only
partially scanned.
In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate information, effectively addressing the challenge posed by the partial scans.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGB-D image. Then our approach adopts an adaptive search based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGB-D images extracted from the ScanRefer dataset and our newly collected SUN-Refer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.1% and 11.2% Acc@0.5) on both datasets.
### Dataset
Download SUNREFER_v2 dataset
SUNREFER dataset contains 38,495 referring expression corresponding to 7,699 objects from SUNRGBD dataset. Here is one example from SUNREFER dataset:
![](https://github.com/HaolinLiu97/Refer-it-in-RGBD/raw/main/docs/dataset_example.png)
# Install packages
CUDA 10.2 is used for this project.
Install other package by:
```angular2
pip install -r requirement.txt
```
Install weighted FPS by:
```angular2
cd weighted_FPS
python setup.py install
```
Install pointnet2 by:
```angular2
cd third_party/pointnet2
python setup.py install
```
Install MinkowskiEngine, detail can be referred in
this link.
# Prepare data
Firstly create a new folder named data under the root directory. Download glove word embedding file glove.p in
glove.p.
### ScanRefer dataset
The processed data of ScanRefer and ScanNet is in
scanrefer data.
or
ScanRefer BaiduCloud, code: x2zw .
Unzip and put the scannet_singleRGBD folder under data. There should be several folders inside the scannet_singleRGBD,
which are pcd, storing the point cloud of single-view RGBD image; pose, the camera extrinsic and intrinsic of each image; bbox, store all gt bounding box; and train/val split referring expression data in two .json file.
The processing script of how to prepare the data will be released later.
### SUNRefer dataset
We recommend to download the preprocessed data for SUNRefer dataset, on which you can train directly.
The preprocessed data of SUNRefer dataset is released in
sunrefer data
or
SunRefer BaiduCloud code: rljw
unzip and create a new folder named sunrefer_singleRGBD under data. Put the SUNREFER_train.pkl, SUNREFER_val.pkl and sunrgbd_pc_bbox_votes_30k_v2 under data/sunrefer_singleRGBD/
#### processing the sunrgbd and sunrefer data
Please refer to the sunrgbd folder for the processing of the sunrgbd data, which is modified from
votenet .
During the training, we merge SUNSPOT dataset and SUNRefer dataset for better diversity.
The SUNRefer and
SUNSPOT dataset can be merged by running (The processed SUNSPOT dataset is already under /docs)
```angular2
python utils/merge_sunrefer_sunspot_dataset.py
```
# Training
The training procedure is split into two stage.
Firstly, train the voxel-level matching model indenpendently for 10 epochs by running
```angular2
python main.py --config ./config/pretrain_config.yaml
```
for training on SUNRefer dataset, runs:
```angular2
python main.py --config ./config/pretrain_sunrefer_config.yaml
```
You can adjust the configuration, I train all the models on one RTX2080Ti using batch size=14.
Then, train the whole referring model by running (make sure the hm_model_weight in the configuration files are set properly):
```angular2
python main.py --config ./config/train_scanrefer_config.yaml
```
for training the whole model on SUNRefer dataset, runs:
```angular2
python main.py --config ./config/train_sunrefer_config.yaml
```
please make sure the weight of the voxel-level matching is loaded, which is defined in the
`hm_model_resume' entry in the configuration file.
PS: sometime the training will be stopped due to some bugs in CUDA10.x (CUDA11 works fine, but it will need pytorch 1.7.1). You will need to resume the training manually
by setting the resume=True in the configuration file, and change the weight entry to be the path of the checkpoint.
# Testing
Modify the weight path in /config/test_scanrefer_config.yaml (verse vice for testing for sunrefer dataset). Then run the following command to test the model:
```angular2
python main.py --mode test --config ./config/test_scanrefer_config.yaml
```
After the testing, the files that stores the results of each image and description will be saved.
# Evaluate
You can further evaluate the result after running the testing and saving the results by running:
```angular2
python evaluate.py --result_dir ./checkpoints/save_dir
```
# Citation
If you find our work useful, please cite
```angular2
@inproceedings{liu2021refer,
title={Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images},
author={Liu, Haolin and Lin, Anran and Han, Xiaoguang and Yang, Lei and Yu, Yizhou and Cui, Shuguang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6032--6041},
year={2021}
}
```