the 3d coordinates - Githubissues

@crazyliyi Thanks for your interest. In fact, I am not retrieving 3D coordinates at all since we have no 3D information whatsoever. At least partly, this is due to the current state in RoboCup: humanoid robots fall a lot and stereo-vision camera setups are a lot more expansive and also more prone to miscalibration when things "fall a lot".

We are receiving 2D information and predicting a per-pixel (heatmap) likelihood of being an object of class X (here: ball). In order to get (2D) center coordinates, we simply calculate the center of any object, i.e. the center of a positive prediction cluster (a cluster of pixels adjacent to each other having an activation above a certain threshold).

Without any post-processing, you are just getting an idea about the relative size of the object and its direction (like humans do). For 3D information, you have to apply post-processing like generating point clouds out of the predictions, for example. We do similar stuff.

The most simplistic approximation for predicting the distance to the ball is simply using Pythagorean theorem, because in general, you know (1) the height of your robot, (2) the position of the camera mounted on your robot, (3) the angle under which the camera currently observes the environment.

Hope this helps you.

Daniel451 / Towards-Real-Time-Ball-Localization-using-CNNs

the 3d coordinates #1