NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.01k stars 283 forks source link

Made it work with Intel RealSense D435 #10

Open pauloabelha opened 5 years ago

pauloabelha commented 5 years ago

Hi all,

Thought folks here might be interested to know that I've made a first version that works with an Intel RealSense D435 camera; host machine was an Ubuntu 18. You can find it in my forked repo:

https://github.com/pauloabelha/Deep_Object_Pose

It is not the best final solution as the way I did it requires one to start the RealSense ROS publishing by hand (nice thing would be to write a Python wrapper for that). However, it might be useful for some people here who just want to get it up and running with the Intel RealSense D435. Please also elt me know if there is any license infrigement (I have my Docker image puclibly available in my Docker hub for pulling and I've also kept the original license and most of the original README, with an added disclaimer).

TontonTremblay commented 5 years ago

Thank you so much for porting the code to work with RealSense, I did not have time to look at your changes. I have one question, are you using the depth to fix the z prediction from DOPE?

pauloabelha commented 5 years ago

Hi @TortonTremblay, No, I have not. Could you please point me to or explain this correction that needs to be done? I’ve only just done a first version to get it up and running on the RealSenseD435, but would be happy to help improving it. Best,

TontonTremblay commented 5 years ago

I mean instead of trusting the depth coming from dope in the pose estimation, you could fix the depth prediction using what the depth camera gives you. But now that I am thinking about it, you have to predict the depth of the object centroid, this would involve some math about where the depth ray touched and how far the ray is from the collision to the centroid using the rotation pose estimation. Something easier to do might just be to use ICP or something like to refine the final pose using the final point cloud. I might look into it as we have some realsense cameras lying around. Thank you for the forking and sharing.

ramilmsh commented 4 years ago

@TontonTremblay how inaccurate is DOPE depth prediction, though? it is very wrong, or just not precise.

If it's the latter, maybe you could make a 3d model of such objects using your realsense, project it onto the depth image, using the DOPE estimate, and run correlation on the estimated neighbourhood (simple peak-finding? gradient descent?).

Or am I patently wrong?

mintar commented 4 years ago

You are correct that the depth error is by far the largest in DOPE, which is expected given that it's only using 2D image data, and it could definitely benefit from using the depth image as well.

I have already implemented exactly the idea that you described in the hybrit branch of my fork here:

https://github.com/mintar/Deep_Object_Pose/tree/hybrit

It's not cleaned up yet, and also not working as well as expected, so I haven't submitted a pull request yet. But feel free to play around with it and use it as a base for your work.

I'm not going to be able to respond a lot for the next 2 weeks though, since I'm on christmas vacation.

huckl3b3rry87 commented 4 years ago

@mintar I am also interested in using depth data to improve the pose estimate in DOPE. I have a few question if you don't mind:

1: So, just to clarify, this commit starts what @ramilmsh said

make a 3d model of such objects using your realsense, project it onto the depth image, using the DOPE estimate, and run correlation on the estimated neighbourhood (simple peak-finding? gradient descent?).

  1. Assuming that the answer to 1 is yes, do you think that this approach will be better than ICP? Which is how PoseCNN refines the pose.

  2. You mentioned "and also not working as well as expected." Does it still improve the pose estimate?

  3. I am also interested in adding tactile data to improve pose estimates that are in the hand of the robot. Do you have any clever ideas for this?

I was thinking of simply augmenting the point cloud data from the camera with estimate of where the gripper sensors are in contact with the object. But, it would seem that such tactile data would be more important and reliable than depth data from the camera, so it should be somehow weighted more.

  1. Do you think that training the CNN with the RGB image as well as the depth || depth and tactile data could improve the pose estimate? Or do you think that post processing the output of DOPE will be better?

I noticed that there is a LIDAR point cloud plug in for the Unreal Engine.

I am trying to get a sense for the feasibility of such modifications to the current framework.

Thanks!