RGB-D Salient Object Detection via 3D Convolutional Neural Networks (AAAI 2021)

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond (IEEE TNNLS)

Preface

This repo contains the source code and prediction saliency maps of our RD3D and RD3D+. The latter is an extension of the former, which is lighter and more computationally efficient and accurate.

RD3D: RGB-D Salient Object Detection via 3D Convolutional Neural Networks (PDF)

RD3D+: 3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond (PDF)

Update

:fire: Update 2022/09/15 :fire: Our work of RD3D+ is officially accepted and published in the IEEE Transactions on Neural Networks and Learning Systems now!

:fire: Update 2021/09/10 :fire: The Pytorch implementation of RD3D+ is now available! PDF is coming soon.

:fire: Update 2020/12/29 :fire: The Pytorch implementation of RD3D is now available!

Dataset

Datasets in use:
- NJU2K (1,985 pairs)
- NLPR (1,000 pairs)
- STERE (1,000 pairs)
- DES/RGBD135 (135 pairs)
- SIP (929 pairs)
- DUTLF-D (1,200 pairs)
- RedWeb-S (1,000 pairs)
More information and downloading links of the former six datasets can be found in page, and the RedWeb-S can be downloaded from this project page.

💡Important tips💡

1485 paired RGB and depth images from NJU2K and 700 pairs from NLPR are used for training, while the remaining pairs are used for testing.
On the DUTLF-D, however, additional 800 pairs from it are used for training and the rest 400 pairs are used for testing.
In summary, our training set contains 2,185 pairs except when testing is conducted on DUTLF-D. More details can be found in the paper that will be released soon.

Usage

Repo clone

git clone https://github.com/PPOLYpubki/RD3D.git
cd RD3D

Prerequisites

Required packages are listed below:

Ubuntu 16.04
python=3.6
pytorch>=1.6
torchvision with pillow<7
cuda>=10.1
others: pip install termcolor opencv-python tensorboard

Inference

Download the pre-trained weights and save them as ./model_path/RD3D.pth and ./model_path/RD3D_plus.pth.
- RD3D Baidu Cloud, Fetch code: yoyj,Google Drive
- RD3D+ Baidu Cloud, Fetch code: 7d3g,Google Drive

Make sure your testing dataset be in ./data_path/ and run the following commands for inference:

On datasets except for the DUTLF-D.

# RD3D
python test.py --model RD3D --model_path ./model_path/RD3D.pth --data_path ./data_path/ --save_path ./save/all_results/ 
# RD3D+
python test.py --model RD3D_plus --model_path ./model_path/RD3D_plus.pth --data_path ./data_path/ --save_path ./save/all_results/

In particular, as what was mentioned in the Important tips, we also provide pre-trained weights of RD3D (Baidu Cloud, Fetch code: enza),(Google Drive) and pre-trained weights of RD3D+ (Baidu Cloud, Fetch code: 1lfc),(Google Drive) for the DUTLF-D case. Specifically, run the following command to test on the DUTLF-D:

# RD3D
python test.py --model RD3D --model_path ./pth/RD3D_DUTLF-D.pth --data_path ./data_path/ --save_path ./save/all_results/ 
# RD3D+
python test.py --model RD3D_plus --model_path ./pth/RD3D_plus_DUTLF-D.pth --data_path ./data_path/ --save_path ./save/all_results/

All of our training processes are actually based on multiple GPUs. However, we have modified the key of some pre-trained weights, so please follow our command here for inference, otherwise there will be an error.

Training

By default, make sure the training datasets be in the folder ./data_path/.
Run the following command for training (Note that the model_name below can be either RD3D or RD3D_plus):
```
python train.py --model [model_name] --data_dir ./data_path/
```
Note that for researchers training with multiple GPUs, remember to add --multi_load to the inference command during testing.

Evaluation

We follow the authors of the SINet to conduct evaluations on our testing results.

We provide complete and fair one-key evaluation toolbox for benchmarking within a uniform standard. Please refer to this link for more information: Matlab version: https://github.com/DengPingFan/CODToolbox Python version: https://github.com/lartpang/PySODMetrics

Result

Qualitative performance

Quantitative RGB-D SOD results in terms of S-measure (S_α), maximum F-measure (F_β^max), maximum E-measure (E_Φ^max) and mean absolute error (MAE). Seven datasets are employed. For brevity, values in the table below are in the form of RD3D|RD3D+.

Dataset	S_α	F_β^max	E_Φ^max	MAE
NJU2K	0.916\|0.928	0.914\|0.928	0.947\|0.955	0.036\|0.033
NLPR	0.930\|0.933	0.919\|0.921	0.965\|0.964	0.022\|0.022
STERE	0.911\|0.914	0.906\|0.905	0.947\|0.946	0.037\|0.037
RGBD135	0.935\|0.950	0.929\|0.946	0.972\|0.982	0.019\|0.017
DUTLF-D	0.932\|0.936	0.939\|0.945	0.960\|0.964	0.031\|0.030
SIP	0.885\|0.892	0.889\|0.900	0.924\|0.928	0.048\|0.046
ReDWeb-S	0.700\|0.718	0.687\|0.697	0.780\|0.786	0.136\|0.130

Downloading links of our result saliency maps:
- RD3D: Baidu Cloud (Fetch code: am16), Google Drive
- RD3D+: Baidu Cloud (Fetch code: hwna), Google Drive

Benchmark RGB-D SOD

The complete RGB-D SOD benchmark can be found in this page.

Citation

Please cite our work if you find them useful:

@inproceedings{chen2021rgb,
    title={RGB-D Salient Object Detection via 3D Convolutional Neural Networks},
    author={Chen, Qian and Liu, Ze and Zhang, Yi and Fu, Keren and Zhao, Qijun and Du, Hongwei},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={35},
    number={2},
    pages={1063--1071},
    year={2021}
    }

Dataset	S_α	F_β^max	E_Φ^max	MAE
NJU2K	0.916\|0.928	0.914\|0.928	0.947\|0.955	0.036\|0.033
NLPR	0.930\|0.933	0.919\|0.921	0.965\|0.964	0.022\|0.022
STERE	0.911\|0.914	0.906\|0.905	0.947\|0.946	0.037\|0.037
RGBD135	0.935\|0.950	0.929\|0.946	0.972\|0.982	0.019\|0.017
DUTLF-D	0.932\|0.936	0.939\|0.945	0.960\|0.964	0.031\|0.030
SIP	0.885\|0.892	0.889\|0.900	0.924\|0.928	0.048\|0.046
ReDWeb-S	0.700\|0.718	0.687\|0.697	0.780\|0.786	0.136\|0.130

PolynomialQian / RD3D

readme