Closed nihalsid closed 1 year ago
DiGress makes predictions for all pairs of nodes, so it makes sense to use all pairs of nodes to perform the message-passing as well. For small molecules, considering all pairs is very common, and actually yields the best performance.
If you work with larger graphs, you might want to replace the transformer layer in order to reduce the GPU use, but I don't know how it will affect performance.
Thanks for the quick response!
Hi, thanks for the great work!
Looking at NodeEdgeBlock layer it seems that the attention is applied assuming full connectivity in the graph (except for invalid nodes masked out by node_mask). Is this the case? If so, is there a reason why you didnt go with the graph attention layer implementation here that takes into account the connectivity?
Thanks! Yawar