The Linformer layer looks very interesting w.r.t. managing computational complexity of transformers. Having such an implementation would prove useful for upcoming projects involving point cloud transformers or vision transformers. This could probably be accomplished by subclassing existing PyTorch attention layers and adding in the intermediate low rank transformation.
Would probably be better to go with a Performer implementation, which achieves linear complexity while making fewer assumptions about the attention matrices.
The Linformer layer looks very interesting w.r.t. managing computational complexity of transformers. Having such an implementation would prove useful for upcoming projects involving point cloud transformers or vision transformers. This could probably be accomplished by subclassing existing PyTorch attention layers and adding in the intermediate low rank transformation.