Training on own data - bboxes in world or camera frame?

Hi, I am in the phase of doing some training on my own data.

I collected about 300 3D bboxes from about 50 scenes. My annotations are in the form of 8 corners with respect to the camera frame. Basically I used RealSense, captured a scene, converted the depth in point cloud and annotated on the point cloud itself. I can convert these 8 corners as explained in tips.md, shouldn't take long.

Does votenet want bboxes with respect to world or camera frame? (World frame makes little sense to me, but you never now...) Is there already a script that does this transformation in votenet repo?

To figure out this a bit I checked the procedure to prepare sunrgbd dataset, but I ended up in Rtilt, which confused me. I couldn't find in the documentation what Rtilt is. The only thing I found is from this paper, that says that it is the transformation between camera and world system. Why is this needed?

Cheers

facebookresearch / votenet

Training on own data - bboxes in world or camera frame? #94