Labeled 3D world coordinate data

From the paper:

Our labels are defined as follows. At pixel p, the calibrated depth D(p) allows us to compute the 3D camera space coordinate x. Using homogeneous coordinates, this camera position can be transformed into the scene’s world coordinate frame as m = Hx. Our labels are simply defined as these scene world positions, m

The labels are the depth image values projected into camera space and then transformed by the pose which is acquired from either KinectFusion, a motion tracker, or some other ground-truth. In the case of the TUM data-sets this is an external mocap system. The code you provided does the camera space projection.

The pixels are not what is acquired from KinectFusion but rather the poses to transform the camera space coordinates.

I'm not sure what happened to your question, but I saw it in the email.

ISUE / relocforests

Labeled 3D world coordinate data #13