fjhzhixi / 3D-SPS

MIT License
54 stars 1 forks source link

3D-SPS

Code for our CVPR 2022 Oral paper "3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection".

[arXiv] [BibTeX]


## Dataset If you would like to access to the ScanRefer dataset, please fill out [this form](https://forms.gle/aLtzXN12DsYDMSXX6). Once your request is accepted, you will receive an email with the download link. > Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the [ScanNet Instructions](data/scannet/README.md) for more details. Download the dataset by simply executing the wget command: ```shell wget ``` ### Data format ``` "scene_id": [ScanNet scene id, e.g. "scene0000_00"], "object_id": [ScanNet object id (corresponds to "objectId" in ScanNet aggregation file), e.g. "34"], "object_name": [ScanNet object name (corresponds to "label" in ScanNet aggregation file), e.g. "coffee_table"], "ann_id": [description id, e.g. "1"], "description": [...], "token": [a list of tokens from the tokenized description] ``` ## Setup The code is now compatiable with PyTorch 1.6! Please execute the following command to install PyTorch ```shell conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch ``` Install the necessary packages listed out in `requirements.txt`: ```shell pip install -r requirements.txt ``` After all packages are properly installed, please run the following commands to compile the CUDA modules for the PointNet++ backbone: ```shell cd lib/pointnet2 python setup.py install ``` __Before moving on to the next step, please don't forget to set the project root path to the `CONF.PATH.BASE` in `config/default.yaml`.__ ### Data preparation 1. Download the ScanRefer dataset and unzip it under `data/`. 2. Download the text embeddings: * preprocessed [GLoVE embeddings (~990MB)](http://kaldir.vc.in.tum.de/glove.p) and put them under `data/`. * preprocessed [CLIP embeddings (~1.29MB)](https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz) and put them under `data/`. 3. Download the ScanNetV2 dataset and put (or link) `scans/` under (or to) `data/scannet/scans/` (Please follow the [ScanNet Instructions](data/scannet/README.md) for downloading the ScanNet dataset). > After this step, there should be folders containing the ScanNet scene data under the `data/scannet/scans/` with names like `scene0000_00` 4. Pre-process ScanNet data. A folder named `scannet_data/` will be generated under `data/scannet/` after running the following command. Roughly 3.8GB free space is needed for this step: ```shell cd data/scannet/ python batch_load_scannet_data.py ``` > After this step, you can check if the processed scene data is valid by running: > ```shell > python visualize.py --scene_id scene0000_00 > ``` 5. Download the pre-trained PointNet++ backbone ([`Google Drive`](https://drive.google.com/file/d/1oC-3mx104EReRqI4nTwjwkm9TZOy0lQM/view?usp=sharing) or [`Baidu Drive`](https://pan.baidu.com/s/1dlWUi86-fjZH3VQSH_MsSQ?pwd=likl)(`passcode: likl`)]) 5. (Optional) Pre-process the multiview features from ENet. a. Download [the ENet pretrained weights (1.4MB)](http://kaldir.vc.in.tum.de/ScanRefer/scannetv2_enet.pth) and put it under `data/` b. Download and decompress [the extracted ScanNet frames (~13GB)](http://kaldir.vc.in.tum.de/3dsis/scannet_train_images.zip). c. Change the data paths in `config.py` marked with __TODO__ accordingly. d. Extract the ENet features: ```shell python script/compute_multiview_features.py ``` e. Project ENet features from ScanNet frames to point clouds; you need ~36GB to store the generated HDF5 database: ```shell python script/project_multiview_features.py --maxpool ``` > You can check if the projections make sense by projecting the semantic labels from image to the target point cloud by: > ```shell > python script/project_multiview_labels.py --scene_id scene0000_00 --maxpool > ``` ## Usage ### Training ```shell python scripts/train.py --config ./config/default.yaml ``` For more training options (like using preprocessed multiview features), please see details in `default.yaml`. ### Evaluation To evaluate the trained ScanRefer models, please download the trained model([`Google Drive`](https://drive.google.com/drive/folders/1hJOdcd75LPqrib94ciFn0ENFbFaF9Z49?usp=sharing) or [`Baidu Drive`](https://pan.baidu.com/s/1OhJrSLl72z5cweGRLr-fIQ?pwd=x3vl)(`passcode: x3vl`)]) and put it in the `` under `outputs/` and run : ```shell python scripts/eval.py --config ./config/default.yaml --folder --reference --no_nms --force ``` ## Acknowledgement We would like to thank the authors of [ScanRefer](https://github.com/daveredrum/ScanRefer) and [Group-Free](https://github.com/zeliu98/Group-Free-3D) for their open-source release. ## License `3D-SPS` is released under the [MIT license](LICENSE). ## Citation Consider cite 3D-SPS in your publications if it helps your research. ``` @article{luo20223d, title={3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection}, author={Luo, Junyu and Fu, Jiahui and Kong, Xianghao and Gao, Chen and Ren, Haibing and Shen, Hao and Xia, Huaxia and Liu, Si}, journal={arXiv preprint arXiv:2204.06272}, year={2022} } ```