Open fabiopoiesi opened 4 years ago
I think that after you use rgb+depth to calculate the coordinates of the points in the point cloud (in the camera coordinate system), you need to convert these points to the world coordinate system. Here the world coordinate system is the upright-depth coordinate system As said in votenet/sunrgbd/sunrgbd_utils.py:class Isaacsim_Calibration(object):upright depth coordinate: tilted depth coordinate by Rtilt such that Z is gravity direction, Z is up-axis, Y is forward, X is right-ward
Hi, I am in the phase of doing some training on my own data.
I collected about 300 3D bboxes from about 50 scenes. My annotations are in the form of 8 corners with respect to the camera frame. Basically I used RealSense, captured a scene, converted the depth in point cloud and annotated on the point cloud itself. I can convert these 8 corners as explained in tips.md, shouldn't take long.
Does votenet want bboxes with respect to world or camera frame? (World frame makes little sense to me, but you never now...) Is there already a script that does this transformation in votenet repo?
To figure out this a bit I checked the procedure to prepare sunrgbd dataset, but I ended up in Rtilt, which confused me. I couldn't find in the documentation what Rtilt is. The only thing I found is from this paper, that says that it is the transformation between camera and world system. Why is this needed?
Cheers