It seems that the repo does a good job of abstracting the various components away to linear attention functions etc., so we should be able to adapt these functions in our codebase fairly easily. It also seems that they have reproduced some of the benchmarks on the linformer website, so that is a good sign. Sanity checks will still be required.
Implement the linformer model from the following paper: https://arxiv.org/pdf/2006.04768.pdf The implementation will be based on the opensource implementation: https://github.com/tatp22/linformer-pytorch
It seems that the repo does a good job of abstracting the various components away to linear attention functions etc., so we should be able to adapt these functions in our codebase fairly easily. It also seems that they have reproduced some of the benchmarks on the linformer website, so that is a good sign. Sanity checks will still be required.