Few questions about your implementations

Atten4Vis / MS-DETR

[CVPR 2024] The official implementation for "MS-DETR: Efficient DETR Training with Mixed Supervision"

Apache License 2.0

82 stars 5 forks source link

attention_weights = self.attention_weights(query).view(N, Len_q, self.n_heads, self.n_levels * self.n_points) attention_weights = F.softmax(attention_weights, -1).view(N, Len_q, self.n_heads, self.n_levels, self.n_points)

Hi,

Thank you for your interest to our work. For your questions:

What is the difference between between models/ and impl_a/models/ ?

impl_a is the implementation (a) of mixed supervision as illustrated by Figure 4 (a) in our paper. The main difference lies in the deformable_detr.py (L122-L125, L198-L219) for impl_a we did not change the architecture of decoder layers and adds an auxiliary predictors for one-to-many predictions. More details are available in Section 3.3 of our paper.

Does the model and the training process is compatible with fp16 precision ?

We did not test it under fp16 precision, it depends on the whether the MS-Deform operators, which we directly borrowed from Deformable-DETR supports fp16 precision. Maybe you can check in the original implementation repository, and as I know some third party implementation has supported fp16 for MS-Deform operators.

In DeformableAttention, do you use reference point or bbox reference ?

We use the reference points.

What is the role of self.im2col_step = 64 in MSDeformAttn ?

The im2col_step may relates to some memory efficiency operations in the tensor operations as implemented by Deformable-DETR. I did not investigate it in detail.

And the attention weights computed in MSDeformAttn is also different from what vanilla attention operation in PyTorch, the attention weights is computed by Q and reference points, as described in the paper of Deformable-DETR, you can check it for more details.

I hope this clears up your questions. Feel free to reach out if you have other questions.

Atten4Vis / MS-DETR

Few questions about your implementations #18