A complementary video for our paper can be found here. A video show casing our grasping experiments can be found here:
Recently developed deep neural networks achieved state-of-the-art results in the subject of 6D object pose estimation for robot manipulation. However, those supervised deep learning methods require expensive annotated training data. Current methods for reducing those costs frequently use synthetic data from simulations, but rely on expert knowledge and suffer from the domain gap when shifting to the real world. Here, we present a proof of concept for a novel approach of autonomously generating annotated training data for 6D object pose estimation. This approach is designed for learning new objects in operational environments while requiring little interaction and no expertise on the part of the user. We evaluate our autonomous data generation approach in two grasping experiments, where we archive a similar grasping success rate as related work on a non autonomously generated data set.
We created a Terminal User Interface to quickly test our implementations. After setting up your hardware and installation, you can run it with:
$ python main.py
The Terminal User Interface gives you the following options:
The selections 1, 5, 6, and 7 of the Terminal User Interface are hardware dependent (please see the Hardware section for setup instructions). The rest of the selections can be used with either data aquired by your setup, or with our data (please see the Data section for download instructions).
Dependencies:
Python 3.6 Packages (we put the pgk vision we used in brackets):
Run Terminal User Interface:
In order to conduct your own grasping experiments or aquire new data you need a RGB-D Camera, an industrial robot armm, and a gripper. We use an "Realsense-435" depth camera, the "UR-5 CB3" robot arm, and the "Robotiq 2F-85" gripper. Futhermore, you need to find the hand-eye-calibration for your setup, and adapt the robot view points used for data acquisition and grasping.
Robot and Gripper: We are not providing any drivers for the robot and gripper. If you want to use your own setup you will need to write your own drivers and comunication. We provide a "robot and gripper" controller in "robotcontroller/TestController.py". It uses a "robot and gripper" client to interact with the hardware. You can take this as a starting point to connect your hardware. Please make sure that all functions in the "RobotController" are callable.
Camera: If you want to use your own RGB-D Camera you can replace our "DepthCam" in ".depth_camera/DepthCam.py". Please make sure that the functions of the "DepthCam" are working simliar to our implementation. If you have a Realsense-435 you can use our DepthCam implementation. Please make sure you have installed the realsense sdk and pyrealsense2 (see dependencies).
Hand-Eye-Calibration: We use a aruco board for hand-eye-calibration. You can use our hand-eye-calibration implementations to get the camera poses. However, in order to get the robot poses you need to implement your own robot controller first (see the Hardware section). We do not provide the implementation our hand-eye-calibration method, since we reused it from an other project and it is written in c++ with furhter requirements. (our implementation is based on "CamOdoCal")
View-Points: The data aquireing requires a set of view points, which are unique to your setup. So remember to make your own set of viewpoints for your unique setup. The grasping also requires a set of viewpoints, which need to be updated according to your setup. Please find the viewpoints here. You can use our path creation or implement your own method.
Reference Point: In our setup we have defined a reference point
All data used in the components of this project can be downloaded. A download link and instructions can be found in the readme of the each component. The comonents with data are "background_subtraction", "data_generation", "DenseFusion", "label_generator", "pc_reconstruction", "experiments", and "segmentation".
Aquired Data: If you do not have a hardware setup, you can download the RGB-D data aquired for this project follwing the instructions here.
Generate labels: You can generate your own segmentation label, point cloud, and target pose with selection 3 and 4 of the Terminal User Interface. Otherwise, you can download our generated labels by following the instructions here.
Extra Data: During data acquisition, we took every 50mm traveled an extra data sample, while the robot was moving. That data has no background image and a segmentation label can not be generated via background subtraction. However, we can use a segmentation model - once trained on the other data - to annotate the extra data. The extra data, in turn, can be used with the other data for the pose estimation training. However, experiments we conducted did not show any improvements when training with extra data. This might be due to poor labels related to motion blur and a time offset between the capured image and the robot pose. More detail can be found in the thesis report connected to this paper(see section Thesis Report below).
You can reproduce our experiments with the Terminal User Interface.
If you want to investigate the training of our background subtraction model you can do that here. You can also download the data used for the training here
This work is based on the MasterThesis by Paul Koch. If you are interested in more details, you can download the full thesis report here.