Open zen-d opened 1 year ago
Thanks for your concern. First of all, the depth head is much slighter than the one in BEVDepth. Secondly, there are some other differences between BEVFormer and VoxelFormer. For example, BEVFormer samples feature from 4 FPN levels, while we only use 1 level. Besides, BEVFormer needs to predict the sampling offsets, while VoxelFormer does not predict them. Due to these issues, VoxelFormer behaves faster than BEVFormer and contains fewer parameters.
@Lizhuoling Hi, thanks for your great work. Regarding Table 4, I wonder why VoxelFormer looks more lightweight than BEVFormer. I think if all the architectural specifications are aligned between them, VoxelFromer actually adds a depth prediction net (borrowed from BEVDepth). The inference time and parameter volumes might slightly increase. Correct me if I miss something.