Detail of "ViT-B + DCNv4".

OpenGVLab / DCNv4

[CVPR 2024] Deformable Convolution v4

https://arxiv.org/pdf/2401.06197.pdf

MIT License

516 stars 27 forks source link

Detail of "ViT-B + DCNv4". #9

Closed TsingWei closed 9 months ago

TsingWei commented 10 months ago

It is interesting in the paper that "Our observations indicate that substituting the previously used DWConv or Attention with our DCNv4 leads to an increase in inference speed".

Could you provide the implemention details of "substituting the attention with DCNv4"?

YuwenXiong commented 10 months ago

We first remove the class token in the ViT and use average pooling to get the final representation for classification, so that we would have a regular square 2D feature map.
We replace the self-attention module with DCNv4, which is the same as the module defined here: https://github.com/OpenGVLab/DCNv4/blob/main/DCNv4_op/DCNv4/modules/dcnv4.py#L28

TsingWei commented 9 months ago

thx