Open christenbc opened 5 years ago
I ask sensor_mgs.pointcloud2
for the point that is in the (x, y) center of the object's bounding box, which returns me a point with (x, y, z) coordinates that I can use to create a tf. This is done in this line.
This has the caveat that if your object has a hollow center, e.g. it's a donut or maybe a person in a strange stance (like this), then the point that is selected doesn`t reflect the position of the actual object.
In that line I referenced, it's possible to ask for more points from the point cloud and take their mean in order to place the object in 3D space. The problem is, if there are other things inside the object's bounding box, there is the possibility that many points inside the bounding box do not come from the detected object. You can have a wall 6 feet behind the object and most points would be sampled from the wall, placing the object further back than it actually is.
Maybe if we apply some clustering method to the returned points and only take the mean of the points from one cluster, either the largest or closest one...
Yeah indeed, taking the central pixel is a rough but valid approach! Your proposal looks quite interesting, I would suggest Mean Shift Clustering for this case. Thanks for the elaborated answers Douglas.
Thanks for the suggestion. I believe DBSCAN may show interesting results, too. Both algorithms are available in scikit-learn and sooner or later I may try them. I'll use this issue to remember me in the future.
Is taking the location of the point in the point cloud corresponding to the center pixel of the bounding box? Or do you rather apply any mean to the center pixels? What method do you use?