Question about the difference between agent attention and anchored stripe attention

Andrew0613 commented 8 months ago

Thank you for your outstanding work! I wanted to inquire if you are familiar with the concept of anchored stripe attention discussed in the paper titled "Efficient and Explicit Modelling of Image Hierarchies for Image Restoration." It appears that there are striking similarities between these two attention mechanisms. Could you elucidate the key distinctions between them?

tian-qing001 commented 8 months ago

Thank you for your interest in our work. Up until the publication of our paper, we were not aware of anchored stripe attention. Upon investigation, we've identified similarities in the design of anchored stripe attention and our agent attention. However, it's crucial to note that there are fundamental differences between the two:

1. Different motivations and perspectives

The motivation of anchored stripe attention is to reduce computation complexity by leveraging cross-scale image similarity. It introduces a set of lower-dimensional variables, denoted as A, to summarize image information, reducing the overall computation complexity to $\mathcal{O}(NM)$, where $M=N/s ^ 2$. Therefore, the overall computation complexity remains quadratic, i.e. $\mathcal{O}(NM)=\mathcal{O}(N ^ 2/s ^ 2)$. Our agent attention aims to integrate Softmax with linear attention, setting the number of agent tokens (n) as a hyperparameter for global modeling with linear complexity $\mathcal{O}(N)$. Despite their visual similarity, anchored stripe attention and our agent attention have essential differences. In varying image resolutions, the number of anchors changes for the former, while the latter maintains a constant number of agent tokens. Therefore, the computation complexity of our agent attention is significantly lower than anchored stripe attention in high-resolution scenes. Section 5.5 of our paper provides a detailed discussion on the superior linear complexity of agent attention.

Furthermore, viewing agent attention as a novel aspect of linear attention is valuable. This perspective aids in a thorough analysis of its pros and cons, making it possible for us to adopt effective linear attention enhancement methods to better leverage our model's advantages, e.g., the introduction of diversity restoration module. Without such perspective and improvement, agent attention can hardly be used alone as a superior substitute for Softmax attention.

2. Different model structures

This work employs a combination of anchored stripe attention, window attention, and channel attention for effective image restoration. In contrast, our agent attention module exclusively utilizes a single attention paradigm: pure agent attention.

3. Different universality

Agent attention serves as a versatile module for general purposes. Extensive experiments on image classification, detection, segmentation, and generation fully demonstrate that our agent attention is a superior alternative to Softmax attention, adaptable to various ViT models. Notably, when applied to Diffusion models with two adjustments, our modified agent attention accelerates generation and substantially enhances image generation quality without any additional training. The model in this work is tailored for image restoration, posing challenges for its application in diverse visual tasks.

We will give more credit to this work and provide a detailed comparison in the revised manuscript.

Andrew0613 commented 8 months ago

Thanks for your reply!

LeapLabTHU / Agent-Attention