OpenGVLab / DCNv4

[CVPR 2024] Deformable Convolution v4
https://arxiv.org/pdf/2401.06197.pdf
MIT License
516 stars 27 forks source link

Detail of "ViT-B + DCNv4". #9

Closed TsingWei closed 9 months ago

TsingWei commented 10 months ago

It is interesting in the paper that "Our observations indicate that substituting the previously used DWConv or Attention with our DCNv4 leads to an increase in inference speed".

Could you provide the implemention details of "substituting the attention with DCNv4"?

YuwenXiong commented 10 months ago
  1. We first remove the class token in the ViT and use average pooling to get the final representation for classification, so that we would have a regular square 2D feature map.
  2. We replace the self-attention module with DCNv4, which is the same as the module defined here: https://github.com/OpenGVLab/DCNv4/blob/main/DCNv4_op/DCNv4/modules/dcnv4.py#L28
TsingWei commented 9 months ago

thx