Closed JhuoW closed 2 years ago
Hi,
Thanks for your interest of our work. The motivation of attention-based feature fusion is inspired by the previous GSC works [1][2]. The intuitive behind is such an attention model can better learn the residual knowledge necessary for inferring the final GSC score.
The activation function in Eq. (4) is applied for non-linearity. Both sigmoid and tanh can be possible depending on the dataset. There is not much difference between these two in most of cases. The graph-level embedding pooling can be referred in [1], where we directly applied in our paper.
[1] Simgnn: A neural network approach to fast graph similarity computation. WSDM [2] Learning-based efficient graph similarity computation via multi-scale convolutional set matching. AAAI
Thanks for explaining.
Hi,
I am a little confused about Eq.(4). Could you explain why the attention layer is designed as Eq.(4), and the motivation of skip connection, i.e., '+h_{ij}', in Eq.(4)?
An additional question is in the code implementation, such as line 137, the \varphi is set to
tanh
, but in the paper, it is set to a sigmoid gating. Besides, in the paper, the graph-level embeddingh
is computed based on the original node features X, but in the code implementation, it is computed based on the attention-transformed features as line 151.