fundamentalvision / BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
https://arxiv.org/abs/2203.17270
Apache License 2.0
3.21k stars 524 forks source link

How to run this on my custom dataset #39

Closed lfxx closed 2 years ago

lfxx commented 2 years ago

Hi,thanks for sharing code.I want to know how can i inference on my own datasets such as 1 frame from six cameras.

zhiqi-li commented 2 years ago

Since the 3D detector is coupled with the camera parameters during training, the generalization of the 3D detector is poor. The model is not suitable for inference on other datasets. If you really need inference on other datasets, you need to ensure that the coordinate system of your dataset is consistent with nuScenes.

lfxx commented 2 years ago

Since the 3D detector is coupled with the camera parameters during training, the generalization of the 3D detector is poor. The model is not suitable for inference on other datasets. If you really need inference on other datasets, you need to ensure that the coordinate system of your dataset is consistent with nuScenes.

OK,would you mind giving a single inference script since the test.sh can only run on the whole datasets.

pmj110119 commented 2 years ago

Excuse me, why is it coupled with camera parameters?

In my understanding, the purpose of the camera parameters is to get the projection matrix to get 2D ref points on image. If you change a camera and the projection matrix changes but there's no stronger noise, the network performance will not be affected.

Maybe my understanding is wrong, I am confused for this, look forward to your reply!

zhiqi-li commented 2 years ago

BEVFormer can be trained on nuScenes and evaluated on Another dataset with different camera systems. But such a cross-dataset testing will generate poor results.

pmj110119 commented 2 years ago

Thanks for your reply!

Poor results on cross-dataset testing is due to scene difference in another dataset (vehicle/road style, lighting, etc.) , and has little to do with the camera parameters, is this right?

timothylimyl commented 1 year ago

Thanks for your reply!

Poor results on cross-dataset testing is due to scene difference in another dataset (vehicle/road style, lighting, etc.) , and has little to do with the camera parameters, is this right?

Snippet from the paper:

Therefore, we still utilize the camera intrinsic and extrinsic to decide the hit views that one BEV query deserves to interact. This strategy makes that one BEV query usually interacts with only one or two views rather than all views, making it possible to use global attention in the spatial cross-attention.