about GRAPH ATTENTION NETWORKS FOR SPEAKER VERIFICATION

I'm trying to reproduce the work in GRAPH ATTENTION NETWORKS FOR SPEAKER VERIFICATION. And I achieve an EER of 3.0 (2.09 in the paper) in the ResNet-F model with a conventional GAT. One possible reason for the higer EER is the element-wise multiplication making symmetric attention weights. Could you tell me the effectiveness of the element-wise multiplication? Thanks a lot.