About the PV branch and BEV module

Zhenghao97 commented 9 months ago

Hi, Sarlin

Recently various meetings appear many bev methods for perception tasks, such as BEVFormer. These methods generally use 6 or more PV images from different views to enlarge the FoV of the bev representation. So the OrienterNet maybe can also introduce more PV images to lift model performance? However I am not sure the affinity bewteen the bev part of these methods like BEVFormer and the regist part of the OrienterNet.

Maybe I can also use the original method from the OrienterNet, but how can i leverage bev feature from the other PV images? Simply stack these bev features sounds like not a good idea.

I hope to refer to some of your ideas, thanks!

sarlinpe commented 9 months ago

You could infer a BEV for each image independently and stitch them into a single local map based on relative poses between the cameras, averaging the features for overlapping regions. This should give similar results to fusing the likelihoods (like in our sequential localization experiments) but computationally lighter. I'm happy to integrate this in the repo if you have a working example.

This would not leverage multi-view information, though. In our latest paper SNAP, we have a solution to transparently fuse information from a single or multiple images into a BEV. The code is not yet public but it shouldn't be hard to implement.

What kinds of dataset provide calibrated and time-synchronized multi-camera images with ground-truth geolocation?

Zhenghao97 commented 9 months ago

Thanks for your ideas! I would develop them on the orienternet.

As far as I know, the nuscenes dataset satisfies all the above conditions, and the recently appeared BEV look-around methods perform their experiments on this dataset.

facebookresearch / OrienterNet

About the PV branch and BEV module #27