Refer-it-in-RGBD

This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021

Paper - ArXiv - pdf (abs)
Project page: https://haolinliu97.github.io/Refer-it-in-RGBD
### Introduction We present a novel task of 3D visual grounding in single-view RGB-D images where the referred objects are often only partially scanned. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate information, effectively addressing the challenge posed by the partial scans. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGB-D image. Then our approach adopts an adaptive search based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGB-D images extracted from the ScanRefer dataset and our newly collected SUN-Refer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.1% and 11.2% Acc@0.5) on both datasets. ### Dataset Download SUNREFER_v2 dataset
SUNREFER dataset contains 38,495 referring expression corresponding to 7,699 objects from SUNRGBD dataset. Here is one example from SUNREFER dataset:

# Install packages CUDA 10.2 is used for this project.
Install other package by: ```angular2 pip install -r requirement.txt ``` Install weighted FPS by: ```angular2 cd weighted_FPS python setup.py install ``` Install pointnet2 by: ```angular2 cd third_party/pointnet2 python setup.py install ``` Install MinkowskiEngine, detail can be referred in this link. # Prepare data Firstly create a new folder named data under the root directory. Download glove word embedding file glove.p in glove.p. ### ScanRefer dataset The processed data of ScanRefer and ScanNet is in scanrefer data. or ScanRefer BaiduCloud, code: x2zw .
Unzip and put the scannet_singleRGBD folder under data. There should be several folders inside the scannet_singleRGBD, which are pcd, storing the point cloud of single-view RGBD image; pose, the camera extrinsic and intrinsic of each image; bbox, store all gt bounding box; and train/val split referring expression data in two .json file. The processing script of how to prepare the data will be released later. ### SUNRefer dataset We recommend to download the preprocessed data for SUNRefer dataset, on which you can train directly.
The preprocessed data of SUNRefer dataset is released in sunrefer data or SunRefer BaiduCloud code: rljw
unzip and create a new folder named sunrefer_singleRGBD under data. Put the SUNREFER_train.pkl, SUNREFER_val.pkl and sunrgbd_pc_bbox_votes_30k_v2 under data/sunrefer_singleRGBD/
#### processing the sunrgbd and sunrefer data Please refer to the sunrgbd folder for the processing of the sunrgbd data, which is modified from votenet . During the training, we merge SUNSPOT dataset and SUNRefer dataset for better diversity. The SUNRefer and SUNSPOT dataset can be merged by running (The processed SUNSPOT dataset is already under /docs) ```angular2 python utils/merge_sunrefer_sunspot_dataset.py ``` # Training The training procedure is split into two stage.
Firstly, train the voxel-level matching model indenpendently for 10 epochs by running ```angular2 python main.py --config ./config/pretrain_config.yaml ``` for training on SUNRefer dataset, runs: ```angular2 python main.py --config ./config/pretrain_sunrefer_config.yaml ``` You can adjust the configuration, I train all the models on one RTX2080Ti using batch size=14. Then, train the whole referring model by running (make sure the hm_model_weight in the configuration files are set properly): ```angular2 python main.py --config ./config/train_scanrefer_config.yaml ``` for training the whole model on SUNRefer dataset, runs: ```angular2 python main.py --config ./config/train_sunrefer_config.yaml ``` please make sure the weight of the voxel-level matching is loaded, which is defined in the `hm_model_resume' entry in the configuration file.
PS: sometime the training will be stopped due to some bugs in CUDA10.x (CUDA11 works fine, but it will need pytorch 1.7.1). You will need to resume the training manually by setting the resume=True in the configuration file, and change the weight entry to be the path of the checkpoint. # Testing Modify the weight path in /config/test_scanrefer_config.yaml (verse vice for testing for sunrefer dataset). Then run the following command to test the model: ```angular2 python main.py --mode test --config ./config/test_scanrefer_config.yaml ``` After the testing, the files that stores the results of each image and description will be saved. # Evaluate You can further evaluate the result after running the testing and saving the results by running: ```angular2 python evaluate.py --result_dir ./checkpoints/save_dir ``` # Citation If you find our work useful, please cite ```angular2 @inproceedings{liu2021refer, title={Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images}, author={Liu, Haolin and Lin, Anran and Han, Xiaoguang and Yang, Lei and Yu, Yizhou and Cui, Shuguang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={6032--6041}, year={2021} } ```

HaolinLiu97 / Refer-it-in-RGBD

readme

Refer-it-in-RGBD