DerrickXuNu / CoBEVT

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Apache License 2.0
206 stars 17 forks source link

Attemption of transfering to Detection task #5

Closed zhuxinguang33 closed 1 year ago

zhuxinguang33 commented 2 years ago

Thank you for sharing the code. I tried to replace the Seg Head of CoBEVT with a simple one-stage detection head. But I found that it hardly work that the regress branch can not to be convergent. So I'd like to ask for you that have ever tried to use the CoBEVT to address detection task? Do you think it's reasonable to work? I really appreciate that you can send me a reply! Thank you!

DerrickXuNu commented 2 years ago

Yes, after you get the Bev feature, don't directly send to the detection head. Use another lightweight encoder to refine it and see how it goes.

timegate commented 1 year ago

Thank you for sharing the code. I tried to replace the Seg Head of CoBEVT with a simple one-stage detection head. But I found that it hardly work that the regress branch can not to be convergent. So I'd like to ask for you that have ever tried to use the CoBEVT to address detection task? Do you think it's reasonable to work? I really appreciate that you can send me a reply! Thank you!

@zhuxinguang33 Did you try it?

zhuxinguang33 commented 1 year ago

Sorry to reply so late. I made a simple modification by replacing the original decoder with a transformer decoder like BEVFormer, followed by a SSD for detection. When loaded the part of pretrained model and only trained the decoder and ssd, the result is: AP(0.3) equals 0.34, AP(0.5) equals 0.26, AP(0.7) equals 0.11. However, it performed badly while end-to-end training.