XuyangBai / TransFusion

[PyTorch] Official implementation of CVPR2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers". https://arxiv.org/abs/2203.11496
Apache License 2.0
619 stars 76 forks source link

Impact of SMCA #28

Open Divadi opened 2 years ago

Divadi commented 2 years ago

Thank you for open-sourcing your work. I was wondering, did you perform an ablation on the impact SMCA has on the network? Apologies if I missed it in the paper.

Also, I found that you make extensive use of positional encodings learned from coordinates, with no (I think) use of sin/cos encodings. Did you ever try the latter/were the former much better?

XuyangBai commented 2 years ago

Hi @Divadi, the impact of SMCA (more precisely, the second transformer decoder layer) is ablated in Table 7. I didn't provide the performance comparison between spatial-modulated cross-attention and traditional cross-attention, but from my experience, the latter convergences much slower and reaches a weaker final performance.

I have only tried learned positional encoding, but sin/cos or Fourier positional encoding also worth trying.