Amshaker / SwiftFormer

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
247 stars 25 forks source link

Subject: Inquiry About Lightweight Feature Extraction with Your Attention Mechanism #16

Open Zhangyuhaoo opened 2 months ago

Zhangyuhaoo commented 2 months ago

I hope this message finds you well. I recently read your impressive paper on [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications], and I must say I was truly amazed by your work.

I am currently working on a task related to feature point extraction and matching, and my focus is on developing lightweight models. I am particularly interested in whether it would be feasible to replace the standard self-attention mechanisms in backbone networks with the attention mechanism you proposed in your research.

I would be grateful for your insights or suggestions on this approach. I apologize for any inconvenience my inquiry might cause and look forward to your response.

Thank you very much, and best wishes.

Amshaker commented 2 months ago

Hi @Zhangyuhaoo,

Thank you for your kind words regarding SwiftFormer. I am delighted to hear that you found the work impressive.

Regarding your task on feature point extraction and matching with a focus on lightweight models, I believe integrating the additive attention mechanism from SwiftFormer could be a promising approach. SwiftFormer’s additive attention mechanism is designed to be computationally efficient while maintaining performance, making it suitable for real-time mobile vision applications. The extracted feature maps will contain rich spatial information, which is crucial for accurate feature point extraction.

However, if the order of the points is critical, you will need to consider how to incorporate positional embeddings effectively into SwiftFormer. This will ensure that the model retains the necessary spatial order information.

I hope this helps. Please feel free to reach out if you have any further questions or need additional insights.

Best regards,

Abdelrahman