FabianFuchsML / se3-transformer-public

code for the SE3 Transformers paper: https://arxiv.org/abs/2006.10503
475 stars 69 forks source link

Shared attention weight? #8

Closed jaekor91 closed 3 years ago

jaekor91 commented 3 years ago

@FabianFuchsML -- I have a question about Eq 11 in https://arxiv.org/pdf/2006.10503.pdf

Here, attention weights a_ij are defined to be shared across different l-type tensors. To preserve equivariance, we could also use separate a_ij for each l. Are there theoretical or practical reasons to prefer the former over the latter?

Thank you in advance!

FabianFuchsML commented 3 years ago

You are correct, equivariance is maintained if you have separate attention weights for each type. There are many ways and I think we just went for the simplest one. It could very well be that other choices would perform better!