Sparse-Dense Motion Modelling and Tracking for Manipulation without Prior Object Models

MultiMotionFusion implements an online tracking and modelling approach for multiple rigid objects, including the environment. It enables the reconstruction and pose estimation of previously unseen objects with respect to the world and the camera. The segmentation of the scene and the detection of new objects relies on motion and thus does not require prior information about objects or the environment.

camera tracking	object tracking

paper:
- arXiv: https://doi.org/10.48550/arXiv.2204.11923
- RA-L: https://doi.org/10.1109/LRA.2022.3200177
video: https://www.youtube.com/watch?v=b8pov4DKLsY
example data:

This project is based on Co-Fusion by Martin Rünz et al.

Citation

If you use this work, please cite our paper:

@ARTICLE{Rauch2022,
  author={Rauch, Christian and Long, Ran and Ivan, Vladimir and Vijayakumar, Sethu},
  journal={IEEE Robotics and Automation Letters},
  title={Sparse-Dense Motion Modelling and Tracking for Manipulation Without Prior Object Models},
  year={2022},
  volume={7},
  number={4},
  pages={11394-11401},
  doi={10.1109/LRA.2022.3200177}
}

Quick Start

MultiMotionFusion is built as a colcon workspace to simplify dependency management. MultiMotionFusion needs a Nvidia GPU that supports CUDA 11. The instructions are for a fresh installation of Ubuntu 20.04 or 22.04.

Install system dependencies (CUDA, ROS, vcstool, rosdep, colcon) using setup.sh. If any of these system dependencies are already installed, skip this step and install the remaining dependencies manually.
```
curl -s https://raw.githubusercontent.com/christian-rauch/MultiMotionFusion/master/doc/setup.sh | bash
sudo apt install cuda-drivers
```

Create a workspace at ~/mmf_ws/, download sources and build using install.sh.

curl -s https://raw.githubusercontent.com/christian-rauch/MultiMotionFusion/master/doc/install.sh | bash

Run example:

# source workspace
source ~/mmf_ws/install/setup.bash
# run keypoint tracking example
wget https://conferences.inf.ed.ac.uk/MultiMotionFusion/estimation/nx_estim2_rotation.bag
mmf_bag_tracking.sh nx_estim2_rotation.bag
# run segmentation example
wget https://conferences.inf.ed.ac.uk/MultiMotionFusion/segmentation/nx_segm4_jaffa_down.bag
mmf_bag_tracking_segmentation.sh nx_segm4_jaffa_down.bag

Installation

Requirements

The following packages have to be installed manually:

CUDA and cuDNN for dense ICP and sparse keypoint prediction, installation instructions
ROS 1 (optionally) to read example data and live RGB-D feed from camera, installation instructions
ROS (optionally) to read example data and live RGB-D feed from camera, ROS 1 (Ubuntu 20.04) or ROS 2 (Ubuntu 22.04)
vcstool to download source repositories
rosdep to automatically resolve binary dependencies
colcon to build the workspace, installation instructions

You can install these system dependencies using the setup.sh script:

curl -s https://raw.githubusercontent.com/christian-rauch/MultiMotionFusion/master/doc/setup.sh | bash

You may also need to update your nvidia driver via sudo apt install cuda-drivers to support the required CUDA version.

MultiMotionFusion can be used without ROS (see CMake options ROSBAG and ROSNODE). But it's highly recommended to use the example data and the live RGB-D feed. Otherwise, MultiMotionFusion supports the same inputs as Co-Fusion (klg logs, image files).

Build colcon workspace

The install.sh script will create the workspace in ~/mmf_ws/ and download the source and binary dependencies and finally build the workspace in Release mode:

curl -s https://raw.githubusercontent.com/christian-rauch/MultiMotionFusion/master/doc/install.sh | bash

To manually rebuild the workspace, e.g. after changing the source code, run:

cd ~/mmf_ws/
colcon build --cmake-args "-DCMAKE_BUILD_TYPE=Release"

All packages will be installed to install. To use the workspace you have to source it via source ~/mmf_ws/install/setup.bash.

Usage

After sourcing the workspace (source ~/mmf_ws/install/setup.bash), you can run the MultiMotionFusion executable with different sets of parameters. By default, without any parameters, this will run the baseline Co-Fusion approach.

Input

In the following, we will use the parameters -run to start tracking and modelling right ahead (otherwise it will start paused) and -dim 640x480 to crop and scale the input image to the target resolution.

Additionally to the input formats supported by Co-Fusion, we support reading from ROS topics and bag files. For bag files you have the choice to process them in real-time by playing them back via rosbag play or reading them deterministically frame-by-frame directly from the bag file.

Example bag files

The bag files used in the paper are available at https://conferences.inf.ed.ac.uk/MultiMotionFusion. They contain the topics:

/rgb/image_raw/compressed: jpeg-compressed colour image
/rgb/camera_info: camera intrinsics
/depth_to_rgb/image_raw/compressed: original png-compressed depth image
/depth_to_rgb/image_raw/filtered/compressed: depth image without visible robot links
/tf and /tf_static: transformations for the Nextage and Vicon kinematic tree

The paper uses the filtered depth images. These images have depth observations from the robot links removed using the realtime_urdf_filter package.

On ROS 1, enable simulation time via rosparam set use_sim_time true and play the bags via rosbag play --clock $FILE.bag to communicate the log time via the /clock topic.

On ROS 2, first convert the bag files via the rosbags tool (pip install rosbags and rosbags-convert $FILE.bag) and then play them back via ros2 bag play --clock 100 $FILE.

Run as ROS node (ROS 1 and 2)

Executing with parameter -ros will register the process as ROS node and subscribe to colour and depth image topics from an RGB-D camera or ROS bag file. This supports the usual ROS 1 and 2 remapping arguments. For a Azure Kinect DK, you have to provide the following remapping arguments:

# ROS 1
MultiMotionFusion -run -dim 640x480 -ros \
  colour:=/rgb/image_raw \
  depth:=/depth_to_rgb/image_raw/filtered \
  camera_info:=/rgb/camera_info \
  _image_transport:=compressed
# ROS 2
MultiMotionFusion -run -dim 640x480 -ros \
  --ros-args \
  -r colour:=/rgb/image_raw \
  -r depth:=/depth_to_rgb/image_raw/filtered \
  -r camera_info:=/rgb/camera_info \
  -p image_transport:=compressed

This will read images in real-time as they are published by the RGB-D driver or ROS bag. The node will wait for the first sensor_msgs/CameraInfo message on the camera_info topic to initialise the image dimensions and show the GUI.

For convenience, create a script that sets the subset of input parameters and accepts additional parameters:

# ROS 1
cat <<EOF > mmf_ros.sh
#!/usr/bin/env bash
MultiMotionFusion -run -dim 640x480 -ros \
  colour:=/rgb/image_raw \
  depth:=/depth_to_rgb/image_raw/filtered \
  camera_info:=/rgb/camera_info \
  _image_transport:=compressed \
  \$@
EOF
# ROS 2
cat <<EOF > mmf_ros.sh
#!/usr/bin/env bash
MultiMotionFusion -run -dim 640x480 -ros \
  \$@ \
  --ros-args \
  -r colour:=/rgb/image_raw \
  -r depth:=/depth_to_rgb/image_raw/filtered \
  -r camera_info:=/rgb/camera_info \
  -p image_transport:=compressed
EOF
chmod +x mmf_ros.sh

This script will then always run MultiMotionFusion as ROS node and accept additional parameters: ./mmf_ros.sh <param_1> ... <param_N>.

Read from ROS bag (ROS 1)

For a deterministic behaviour, you can also read directly frame-by-frame from a ROS bag file by providing its path to the -l parameter and setting the topic names:

MultiMotionFusion -run -dim 640x480 \
  -topic_colour /rgb/image_raw/compressed \
  -topic_depth /depth_to_rgb/image_raw/filtered/compressed \
  -topic_info /rgb/camera_info \
  -l estimation/nx_estim1_manipulation.bag

Sparse-Dense Estimation, Segmentation and Redetection

The contributions of the paper can be enabled individually:

-model $MODEL_PATH: SuperPoint keypoint extraction
-init kp: sparse keypoint initialisation
-icp_refine: dense refinement
-segm_mode flow_crf: sparse-dense CRF motion segmentation
-redetection: model redetection

The final pick&place experiment used the full set of features:

./mmf_ros.sh \
  -model ~/mmf_ws/install/super_point_inference/share/weights/SuperPointNet.pt \
  -init kp \
  -icp_refine \
  -segm_mode flow_crf \
  -redetection

This will enable the keypoint extraction (-model SuperPointNet.pt), use the keypoints to initialise tracking (-init kp), refine this further using dense ICP (-icp_refine), segment the scene using the sparse keypoint reprojection error as unary potential and the dense optical flow as pairwise potential in a dense CRF (-segm_mode flow_crf) and redetect lost models (-redetection).

Instead of using the keypoints for initialisation (-init kp), you can also use the pose of a coordinate frame via -init tf. By default, this will use the colour camera frame provided in the frame_id of the colour image header. The coordinate frame can be changed by setting -init_frame $FRAME.

Reproduce results

Ground Truth

The provided bag files contain the ground truth camera frame, as reported by Vicon, as frame camera_true. To create a ground truth reconstruction, chose this frame for initialisation without the additional dense refinement:

./mmf_ros.sh -init tf -init_frame camera_true

Transformation Estimation

Run the sparse initialisation and dense refinement:

./mmf_ros.sh \
  -model ~/mmf_ws/install/super_point_inference/share/weights/SuperPointNet.pt \
  -init kp -icp_refine

and play back the estimation bag files:

rosbag play --clock nx_estim2_rotation.bag

Motion Segmentation

Run additionally the sparse-dence CRF:

./mmf_ros.sh \
  -model ~/mmf_ws/install/super_point_inference/share/weights/SuperPointNet.pt \
  -init kp -icp_refine \
  -segm_mode flow_crf

and play back the segmentation bag files:

rosbag play --clock nx_segm1_jaffa_up.bag

christian-rauch / MultiMotionFusion

readme