[Paper] Self-attention Does Not Need O(n^2) Memory

SforAiDl / vformer

A modular PyTorch library for vision transformer models

https://vformer.readthedocs.io/

MIT License

162 stars 22 forks source link

[Paper] Self-attention Does Not Need O(n^2) Memory #74

Closed abhi-glitchhg closed 2 years ago

abhi-glitchhg commented 2 years ago

https://arxiv.org/abs/2112.05682

alvanli commented 2 years ago

I can work on this if no one is already working on it

NeelayS commented 2 years ago

That would be great, thanks! I believe we only need to implement an attention class for this paper (please correct me if I'm wrong). One thing to keep in mind is that the attention implementation should have similar inputs / outputs to the other kinds of attention (vanilla, window, etc.) so that it can be seamlessly used with the encoders and decoders present in the library.