Closed abhi-glitchhg closed 2 years ago
I can work on this if no one is already working on it
That would be great, thanks! I believe we only need to implement an attention class for this paper (please correct me if I'm wrong). One thing to keep in mind is that the attention implementation should have similar inputs / outputs to the other kinds of attention (vanilla, window, etc.) so that it can be seamlessly used with the encoders and decoders present in the library.
https://arxiv.org/abs/2112.05682