dougsm / ggcnn

Generative Grasping CNN from "Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach" (RSS 2018)
BSD 3-Clause "New" or "Revised" License
484 stars 138 forks source link

potential ambiguity of angle images #17

Closed yongxf closed 4 years ago

yongxf commented 4 years ago

Hi there, Thank you for sharing the ggcnn code. The performance is quite impressive. I ran your code with my own dataset, and the gripper angle often closes to 0. Later I found a possible logic leak of the designing of the angular supervision. Assume we have a simplest case: cube. We labeled with two grasps, one is (cube center, 0pi), another is (cube center, 0.5pi). Each of these grasps occupy a rectangle in angle image. The problem is that these rectangles are overlapped. The overlapped region has ambiguity. That is, depending on which grasp comes first, the eventual overlap region may be marked by 0pi or 0.5pi. Averaging 0pi and 0.5pi may not be a good idea also, since 0.25*pi makes the gripper grasp cube edges.

The cause of the problem is overlapping. Do you have some workaround on this situation? After all, it is highly possible that more than one grasp exist in one object. Thanks

dougsm commented 4 years ago

You raise a good point. This is something I have noticed before, but don't have a good solution to.
In practice, I have not noticed this causing any significant issues though, i.e. I have not generally noticed the robot grasping the "average" grasp on symmetrical objects, but rather it will tend to a single one of the grasps. I think this might be in part to the size and variability of the training datasets, as well as the train-time augmentation.
If this is a significant issue for you, one possible way to overcome this is to learn several different grasp maps at different grasp bins, like in https://berkeleyautomation.github.io/fcgqcnn/ Hope that helps.

yongxf commented 4 years ago

Thank you for the confirmation and suggestions. Indeed the labeling of the positive grasps in Cornell database is different with my customized one. It seems the grasps in Cornell database is more sparse than my labeling. The FC-GQ-CNN you commended uses angle bins to encode the angular dimension. I guess the same methodology can be used in GGCNN angular encoding: instead of using one image to regress angle (shows angle value in proper pixels), we can also use 18 images (20deg per bin) to classify the angle (display "1" in proper pixels of proper images). This is just my current idea, a little concern on 1) the loss design (mixed regression and classification) and 2) the headache of training with lots of outputs. This may be the reason that FC-GQ-CNN doesn't train from scratch. Thanks again for the reply.