Closed lucasjinreal closed 2 years ago
Our method needs camera intrinsics to calculate the 3D box's localization parameters during inference. I didn't try to test internet videos before. If you have the video's camera intrinsics, it might be possible to do monocular 3D object detection frame by frame.
@Xianpeng919 Does the extrinsic matters?
The extrinsic should matter because our method is trained based on the KITTI dataset (training split), which only includes about 3k images for training.
how to test in internet videos? is that possible with kitti pertained weights?