ethnhe / FFB6D

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.
MIT License
290 stars 72 forks source link
6d-pose-estimation 6dof-pose fusion rgbd-camera rgbd-scene-recognition rgbd-segmentation

FFB6D

This is the official source code for the CVPR2021 Oral work, FFB6D: A Full Flow Biderectional Fusion Network for 6D Pose Estimation. (Arxiv, Video_Bilibili, Video_YouTube)

Table of Content

Introduction & Citation

FFB6D is a general framework for representation learning from a single RGBD image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from PVN3D(Arxiv, Code, Video). At the representation learning stage of FFB6D, we build bidirectional fusion modules in the full flow of the two networks, where fusion is applied to each encoding and decoding layer. In this way, the two networks can leverage local and global complementary information from the other one to obtain better representations. Moreover, at the output representation stage, we designed a simple but effective 3D keypoints selection algorithm considering the texture and geometry information of objects, which simplifies keypoint localization for precise pose estimation.

Please cite FFB6D & PVN3D if you use this repository in your publications:

@InProceedings{He_2021_CVPR,
author = {He, Yisheng and Huang, Haibin and Fan, Haoqiang and Chen, Qifeng and Sun, Jian},
title = {FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

@InProceedings{He_2020_CVPR,
author = {He, Yisheng and Sun, Wei and Huang, Haibin and Liu, Jianran and Fan, Haoqiang and Sun, Jian},
title = {PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Demo Video

See our demo video on YouTube or bilibili.

Installation

Code Structure

[Click to expand] - **ffb6d** - **ffb6d/common.py**: Common configuration of dataset and models, eg. dataset path, keypoints path, batch size and so on. - **ffb6d/datasets** - **ffb6d/datasets/linemod/** - **ffb6d/datasets/linemod/linemod_dataset.py**: Data loader for LineMOD dataset. - **ffb6d/datasets/linemod/dataset_config/models_info.yml**: Object model info of LineMOD dataset. - **ffb6d/datasets/linemod/kps_orb9_fps** - **ffb6d/datasets/linemod/kps_orb9_fps/{obj_name}_8_kps.txt**: ORB-FPS 3D keypoints of an object in the object coordinate system. - **ffb6d/datasets/linemod/kps_orb9_fps/{obj_name}_corners.txt**: 8 corners of the 3D bounding box of an object in the object coordinate system. - **ffb6d/datasets/ycb** - **ffb6d/datasets/ycb/ycb_dataset.py**: Data loader for YCB_Video dataset. - **ffb6d/datasets/ycb/dataset_config/classes.txt**: Object list of YCB_Video dataset. - **ffb6d/datasets/ycb/dataset_config/radius.txt**: Radius of each object in YCB_Video dataset. - **ffb6d/datasets/ycb/dataset_config/train_data_list.txt**: Training set of YCB_Video datset. - **ffb6d/datasets/ycb/dataset_config/test_data_list.txt**: Testing set of YCB_Video dataset. - **ffb6d/datasets/ycb/ycb_kps** - **ffb6d/datasets/ycb/ycb_kps/{obj_name}_8_kps.txt**: ORB-FPS 3D keypoints of an object in the object coordinate system. - **ffb6d/datasets/ycb/ycb_kps/{obj_name}_corners.txt**: 8 corners of the 3D bounding box of an object in the object coordinate system. - **ffb6d/models** - **ffb6d/models/ffb6d.py**: Network architecture of the proposed FFB6D. - **ffb6d/models/cnn** - **ffb6d/models/cnn/extractors.py**: Resnet backbones. - **ffb6d/models/cnn/pspnet.py**: PSPNet decoder. - **ffb6d/models/cnn/ResNet_pretrained_mdl**: Resnet pretraiend model weights. - **ffb6d/models/loss.py**: loss calculation for training of FFB6D model. - **ffb6d/models/pytorch_utils.py**: pytorch basic network modules. - **ffb6d/models/RandLA/**: pytorch version of RandLA-Net from [RandLA-Net-pytorch](https://github.com/qiqihaer/RandLA-Net-pytorch) - **ffb6d/utils** - **ffb6d/utils/basic_utils.py**: basic functions for data processing, visualization and so on. - **ffb6d/utils/meanshift_pytorch.py**: pytorch version of meanshift algorithm for 3D center point and keypoints voting. - **ffb6d/utils/pvn3d_eval_utils_kpls.py**: Object pose esitimation from predicted center/keypoints offset and evaluation metrics. - **ffb6d/utils/ip_basic**: Image Processing for Basic Depth Completion from [ip_basic](https://github.com/kujason/ip_basic). - **ffb6d/utils/dataset_tools** - **ffb6d/utils/dataset_tools/DSTOOL_README.md**: README for dataset tools. - **ffb6d/utils/dataset_tools/requirement.txt**: Python3 requirement for dataset tools. - **ffb6d/utils/dataset_tools/gen_obj_info.py**: Generate object info, including SIFT-FPS 3d keypoints, radius etc. - **ffb6d/utils/dataset_tools/rgbd_rnder_sift_kp3ds.py**: Render rgbd images from mesh and extract textured 3d keypoints (SIFT/ORB). - **ffb6d/utils/dataset_tools/utils.py**: Basic utils for mesh, pose, image and system processing. - **ffb6d/utils/dataset_tools/fps**: Furthest point sampling algorithm. - **ffb6d/utils/dataset_tools/example_mesh**: Example mesh models. - **ffb6d/train_ycb.py**: Training & Evaluating code of FFB6D models for the YCB_Video dataset. - **ffb6d/demo.py**: Demo code for visualization. - **ffb6d/train_ycb.sh**: Bash scripts to start the training on the YCB_Video dataset. - **ffb6d/test_ycb.sh**: Bash scripts to start the testing on the YCB_Video dataset. - **ffb6d/demo_ycb.sh**: Bash scripts to start the demo on the YCB_Video_dataset. - **ffb6d/train_lm.py**: Training & Evaluating code of FFB6D models for the LineMOD dataset. - **ffb6d/train_lm.sh**: Bash scripts to start the training on the LineMOD dataset. - **ffb6d/test_lm.sh**: Bash scripts to start the testing on the LineMOD dataset. - **ffb6d/demo_lm.sh**: Bash scripts to start the demo on the LineMOD dataset. - **ffb6d/train_log** - **ffb6d/train_log/ycb** - **ffb6d/train_log/ycb/checkpoints/**: Storing trained checkpoints on the YCB_Video dataset. - **ffb6d/train_log/ycb/eval_results/**: Storing evaluated results on the YCB_Video_dataset. - **ffb6d/train_log/ycb/train_info/**: Training log on the YCB_Video_dataset. - **requirement.txt**: python3 environment requirements for pip3 install. - **figs/**: Images shown in README.

Datasets

Training and evaluating

Training on the LineMOD Dataset

Evaluating on the LineMOD Dataset

Demo/visualizaion on the LineMOD Dataset

Training on the YCB-Video Dataset

Evaluating on the YCB-Video Dataset

Demo/visualization on the YCB-Video Dataset

Results

Adaptation to New Dataset

License

Licensed under the MIT License.