bradyz / cross_view_transformers

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
MIT License
525 stars 80 forks source link

Question about cross attention module #8

Closed jianingwangind closed 2 years ago

jianingwangind commented 2 years ago

Thanks for sharing the great work! Have you conwsidered the deformable attention? I believe in the paper you were trying to compare queries at each map location to keys at each pixel accross all six perspective views, right?

bradyz commented 2 years ago

Great point - you're correct in describing our method. Deformable attention also makes a lot of sense, and in fact is used by BEVFormer, which is the 2nd top performing camera only method on NuScenes detection