Atten4Vis / MS-DETR

[CVPR 2024] The official implementation for "MS-DETR: Efficient DETR Training with Mixed Supervision"
Apache License 2.0
82 stars 5 forks source link

Few questions about your implementations #18

Open JGuillaumin opened 2 weeks ago

JGuillaumin commented 2 weeks ago

Hi, First thank your very much for your work. It adds a huge improvement to DETR family. And your paper was really well explained and written. Also thank you for publishing your code & models, it was very easy to run it.

I have few questions about the implementation :

ZhaoChuyang commented 2 weeks ago

Hi,

Thank you for your interest to our work. For your questions:

What is the difference between between models/ and impl_a/models/  ?

impl_a is the implementation (a) of mixed supervision as illustrated by Figure 4 (a) in our paper. The main difference lies in the deformable_detr.py (L122-L125, L198-L219) for impl_a we did not change the architecture of decoder layers and adds an auxiliary predictors for one-to-many predictions. More details are available in Section 3.3 of our paper.

Does the model and the training process is compatible with fp16 precision ?

We did not test it under fp16 precision, it depends on the whether the MS-Deform operators, which we directly borrowed from Deformable-DETR supports fp16 precision. Maybe you can check in the original implementation repository, and as I know some third party implementation has supported fp16 for MS-Deform operators.

In DeformableAttention, do you use reference point or bbox reference ?

We use the reference points.

What is the role of self.im2col_step = 64 in MSDeformAttn ?

The im2col_step may relates to some memory efficiency operations in the tensor operations as implemented by Deformable-DETR. I did not investigate it in detail.

And the attention weights computed in MSDeformAttn is also different from what vanilla attention operation in PyTorch, the attention weights is computed by Q and reference points, as described in the paper of Deformable-DETR, you can check it for more details.

I hope this clears up your questions. Feel free to reach out if you have other questions.