fundamentalvision / BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
https://arxiv.org/abs/2203.17270
Apache License 2.0
3.38k stars 549 forks source link

Positional Embedding of BEV queries. #193

Open sherylwang opened 1 year ago

sherylwang commented 1 year ago

Hi,

The paper mentions the presence of learnable positional embeddings for BEV queries. However, I couldn't find the usage of positional embedding in the spatial cross attention. Can you please point it out for me?

Best Regards,

sidiangongyuan commented 1 year ago

In spatial_cross_attention.py has if query_pos is not None: query = query + query_pos it's the same as the TSA part.

sherylwang commented 1 year ago

Thanks for reply. When run bevformer_base, I found query_pos is always None in SCA. Is that right?