Closed jaekor91 closed 3 years ago
You are correct, equivariance is maintained if you have separate attention weights for each type. There are many ways and I think we just went for the simplest one. It could very well be that other choices would perform better!
@FabianFuchsML -- I have a question about Eq 11 in https://arxiv.org/pdf/2006.10503.pdf
Here, attention weights a_ij are defined to be shared across different l-type tensors. To preserve equivariance, we could also use separate a_ij for each l. Are there theoretical or practical reasons to prefer the former over the latter?
Thank you in advance!