Hello!
I am really interested in your work as I think it is something necessary to successfully exploit DETR in real world applications.
At ICLR 2021 an improvement of DETR called "Deformable-DETR" has been proposed with a number of modifications in the transformer part of the network which improve performance and reduce computational complexity.
Are you planning to support Deformable DETR and provide a pretrained model even for it? I certainly think that this could improve the success of your pretraining approach as more people could exploit it.
Thanks for you attention! We noticed the awesome work,Deformable-DETR. There are my opinions in terms of Deformable-DETR.
In my opinion, deformable attention is not a global attention mechanism (Check more disscusions in https://openreview.net/forum?id=gZ9hCDWe6ke¬eId=x1VT5henOtF ). It is more like a sparse sampled deformable convolution. Deformable attention can replace the self-attention in the encoder and cross-attention in the decoder. So, it converges much faster than DETR and it can extend to multi-scale feature maps due to the sparse sampling. But, it is hard to replace the self-attention in the decoder by deformable attention, which needs global attention to perform NMS-like mechanism.
As we disscused in 1, the two part of attentions are sparse connected in Deformable-DETR. We guess the improvement of pre-training is very limited for Deformable-DETR (like the result of https://arxiv.org/abs/1811.08883). So, we may not provide the Deformable DETR support. If it works, the improvement may come from the pre-trained self-attention in the decoder. You can have a try.
Comparisions with Deformable-DETR. As far as we observe, UP-DETR still performs a little better on large objects with the single-scale feature. Deformable-DETR performs better on small and medium objects by making full use of multi-scale features.
Hello! I am really interested in your work as I think it is something necessary to successfully exploit DETR in real world applications. At ICLR 2021 an improvement of DETR called "Deformable-DETR" has been proposed with a number of modifications in the transformer part of the network which improve performance and reduce computational complexity. Are you planning to support Deformable DETR and provide a pretrained model even for it? I certainly think that this could improve the success of your pretraining approach as more people could exploit it.
Code for Deformable DETR is available: https://github.com/fundamentalvision/Deformable-DETR
Thanks in advance