aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
502 stars 79 forks source link

BEVFormer code: num_heads and num_points are swapped #45

Open vniclas opened 9 months ago

vniclas commented 9 months ago

Hi, thank you for the nice work and for sharing your code!

I believe that your implementation of BEVFormer has a small bug: https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/nets/bevformernet.py#L296

It looks like the values for the parameters n_heads and n_points have been swapped compared to the normal initialization https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/nets/ops/modules/ms_deform_attn.py#L31

See also the original implementation of BEVFormer: def __init__(self, embed_dims=256, num_heads=8, num_levels=4, num_points=4, https://github.com/fundamentalvision/BEVFormer/blob/20923e66aa26a906ba8d21477c238567fa6285e9/projects/mmdet3d_plugin/bevformer/modules/decoder.py#L160-L164

as well as the Deformable DETR paper:

M = 8 and K = 4 are set for deformable attentions by default. K number of sampled keys in each feature level for each attention head M number of attention heads

I am not sure how much of a difference it is going to make but just to warn other people.

aharley commented 9 months ago

Wow thanks. Good catch.