There's just better alternatives that don't involve explicitly biasing the attention weight matrix, and they will be more performant on top of providing similar or better accuracy levels.
What alternatives do you recommend? It seems like the NAT repo is using relative positional biases?
I see in the README it says:
What alternatives do you recommend? It seems like the NAT repo is using relative positional biases?