Closed lfxx closed 2 years ago
Since the 3D detector is coupled with the camera parameters during training, the generalization of the 3D detector is poor. The model is not suitable for inference on other datasets. If you really need inference on other datasets, you need to ensure that the coordinate system of your dataset is consistent with nuScenes.
Since the 3D detector is coupled with the camera parameters during training, the generalization of the 3D detector is poor. The model is not suitable for inference on other datasets. If you really need inference on other datasets, you need to ensure that the coordinate system of your dataset is consistent with nuScenes.
OK,would you mind giving a single inference script since the test.sh can only run on the whole datasets.
Excuse me, why is it coupled with camera parameters?
In my understanding, the purpose of the camera parameters is to get the projection matrix to get 2D ref points on image. If you change a camera and the projection matrix changes but there's no stronger noise, the network performance will not be affected.
Maybe my understanding is wrong, I am confused for this, look forward to your reply!
BEVFormer can be trained on nuScenes and evaluated on Another dataset with different camera systems. But such a cross-dataset testing will generate poor results.
Thanks for your reply!
Poor results on cross-dataset testing is due to scene difference in another dataset (vehicle/road style, lighting, etc.) , and has little to do with the camera parameters, is this right?
Thanks for your reply!
Poor results on cross-dataset testing is due to scene difference in another dataset (vehicle/road style, lighting, etc.) , and has little to do with the camera parameters, is this right?
Snippet from the paper:
Therefore, we still utilize the camera intrinsic and extrinsic to decide the hit views that one BEV query deserves to interact. This strategy makes that one BEV query usually interacts with only one or two views rather than all views, making it possible to use global attention in the spatial cross-attention.
Hi,thanks for sharing code.I want to know how can i inference on my own datasets such as 1 frame from six cameras.