Coordinate frame for camera pose

Hi everyone,

I am building a data pipeline to run with qd-3dt as follows:

Extract RGB frames from a monocular video (I have the camera intrinsics)
Generate depth maps using a depth detector (packnet-sfm/monodepth2, etc)
Generate camera trajectory pose using RGBD SLAM (ORB-SLAM3)
Pass the camera trajectory and the RGB frames to qd-3dt to get the 3D detections.

The camera trajectory from ORB-SLAM3 has the format [timestamp, tx, ty, tz, qx, qy, qz, qw], where (tx, ty, tz) is the translation and the (qx, qy, qz, qw) is the orientation in the form of a quaternion. The frame axis for these points is (z-forward, y-left and x-down).

What coordinate frame does the camera pose need to be in when we pass it to qd-3dt? I tried rotating the translation vector by 270 degrees XZ to get a (x-forward, y-right, z-down) frame, however, it does not seem to work. The vehicle trajectory is somehow represented upwards (screenshot: https://imgur.com/a/cAl3ptD).

Has anyone converted the TUM camera trajectory to work with this project?

SysCV / qd-3dt

Coordinate frame for camera pose #34