RobotLocomotion / pytorch-dense-correspondence

Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"
https://arxiv.org/pdf/1806.08756.pdf
Other
557 stars 133 forks source link

understanding matches and non-matches #201

Closed Amulya21 closed 4 years ago

Amulya21 commented 5 years ago

Hi, I am facing problem in understanding how matches and non-matches were specified for training purpose because it was given that no human labelling was done.I am not able to understand batch_find_pixel_correspondences().And moreover how object is identified or differentiated from background(How did you get the mesh of object separately in dataset)?

Thank you.

peteflorence commented 5 years ago

Hi Amulya,

  1. Given knowledge of (a) the cameras poses, (b) the 3D geometry of the scene, and (c) the camera calibration, then we can use geometry to compute pixel-pixel correspondences. Some references for you could be the Hartley + Zisserman book, Rich Szeliski's book... the "Learning OpenCV" books too (chapters on camera models, calibration, 3D vision). The batch_find_pixel_correspondences() code is admittedly tricky to follow, in part because it could be written cleaner :), in part because it's vectorized pytorch functions.
  2. We can isolate the object from the rest of the scene by 3D segmentation. A pretty general approach to do this can be found in [1]. Since we are just using table-top scenes, we can do something even simpler, which is just to segment out the object(s) as the thing(s) above the table. Another easy option is to just do background subtraction in the image-space, but due to subtle lighting changes, etc, this doesn't work as well.

[1] R. Finman, T. Whelan, M. Kaess, and J. J. Leonard. Toward lifelong object segmentation from change detection in dense rgb-d maps. In Mobile Robots (ECMR), 2013 European Conference on, pages 178–185. IEEE, 2013.

Amulya21 commented 5 years ago

Thank you for the reply, Suppose if you have 2 rgb images of the same scene(taken from different view points) and you know know much the camera has rotated and translated(pose data),then by applying those translational and rotational information we can know where a specific point(pixel) in one image has moved to another point in other image. If this is how matches or correspondences are found what is the need of 3d reconstruction.

peteflorence commented 5 years ago

Hi there, You also need to know the depth for each pixel in order to geometrically match between two images, just the camera poses aren’t enough. If you just have two depth images each with cameras poses (and known calibration) that is all you need, but the many-view fused 3D reconstruction helps denoise depth images and fill in missing data caused by practical limitations of depth sensors. All of the above of course assumes the scene is static. In the Schmidt et al reference they also did descriptor training with dynamic scenes, made possible by non-rigid dynamic reconstruction. Good luck!

On Sun, Jul 7, 2019 at 2:20 PM Amulya21 notifications@github.com wrote:

Thank you for the reply, Suppose if you have 2 rgb images of the same scene(taken from different view points) and you know know much the camera has rotated and translated(pose data),then by applying those translational and rotational information we can know where a specific point(pixel) in one image has moved to another point in other image. If this is how matches or correspondences are found what is the need of 3d reconstruction.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/201?email_source=notifications&email_token=ABLBBKE6M77KIMSEEWJUDI3P6IXXJA5CNFSM4H5CEPV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLQQ5Q#issuecomment-509020278, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLBBKETKT2TO2HWNLZPKLTP6IXXJANCNFSM4H5CEPVQ .

Amulya21 commented 5 years ago

Thank you for the reply, I understood how matching is done for images within scene because we know rotation and translation so we can find how a pixel moves from one image to another within scene(that is both images are taken from 2018-04-16-14-40-25) .But how is it done across scene like for example between an image from 2018-04-10-16-05-17 and from 2018-04-16-14-40-25.

And while training the res-net architecture how are you sending two images at a time?If both are sent parallely as like in the below link https://qphs.fs.quoracdn.net/main-qimg-35262db76db5e734c74cdd8d1a97c88d then how is the error distributed or back propagated.

peteflorence commented 5 years ago

Good question. When we do across scene, we do not know any matches, that’s right. We only do cross-scene training when we know the objects are different, and we know every pixel is a non-match. No matches.

The pairs of images are trained in a standard Siamese architecture, I think you can find more elsewhere on this.

On Mon, Jul 15, 2019 at 7:58 AM Amulya21 notifications@github.com wrote:

Thank you for the reply, I understood how matching is done for images within scene because we know rotation and translation so we can find how a pixel moves from one image to another within scene(that is both images are taken from 2018-04-16-14-40-25) .But how is it done across scene like for example between an image from 2018-04-10-16-05-17 and from 2018-04-16-14-40-25.

And while training the res-net architecture how are you sending two images at a time?If both are sent parallely as like in the below link https://qphs.fs.quoracdn.net/main-qimg-35262db76db5e734c74cdd8d1a97c88d then how is the error distributed or back propagated.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/201?email_source=notifications&email_token=ABLBBKDLWQHIJJLOX6UABATP7RQ5PA5CNFSM4H5CEPV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ5O6RA#issuecomment-511373124, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLBBKFLX37IBXE3W2DB5NLP7RQ5PANCNFSM4H5CEPVQ .

Amulya21 commented 5 years ago

Hii, I have a doubt regarding the grasping part .By clicking on a specific point of the image we are giving the robot at which point to grasp on an other position.By finding the point with least euclidean distance in descriptor space it will identify the point but how will it know at what orientation it should hold the object.

Untitled

But suppose if I give the point marked in blue in first image it wll definitely identify the point marked in blue in second image but how will it know it which orientation it needs to grasp it.Because gripper cant grasp it in 90 degree to the table , gripper need to parallel to the table in order to grasp it.How will it identify that oreinattion of gripper?

peteflorence commented 5 years ago

Please see Section C of the Appendix in the paper

On Sun, Jul 21, 2019 at 8:27 AM Amulya21 notifications@github.com wrote:

Hii, I have a doubt regarding the grasping part .By clicking on a specific point of the image we are giving the robot at which point to grasp on an other position.By finding the point with least euclidean distance in descriptor space it will identify the point but how will it know at what orientation it should hold the object.

[image: Untitled] https://user-images.githubusercontent.com/49370470/61591143-8db53d80-abe0-11e9-8056-dcf8aabe2d60.png

But suppose if I give the point marked in blue in first image it wll definitely identify the point marked in blue in second image but how will it know it which orientation it needs to grasp it.Because gripper cant grasp it in 90 degree to the table , gripper need to parallel to the table in order to grasp it.How will it identify that oreinattion of gripper?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/RobotLocomotion/pytorch-dense-correspondence/issues/201?email_source=notifications&email_token=ABLBBKAUXPQ5C74TT2BYBITQARI3DA5CNFSM4H5CEPV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OCKRQ#issuecomment-513549638, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLBBKHKHBDXAZZ4DRQCW43QARI3DANCNFSM4H5CEPVQ .

Amulya21 commented 5 years ago

Thankyou for the reply.