canqin001 / Efficient_Graph_Similarity_Computation

[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
MIT License
39 stars 4 forks source link

About Eq.(4) #1

Closed JhuoW closed 2 years ago

JhuoW commented 2 years ago

Hi,

I am a little confused about Eq.(4). Could you explain why the attention layer is designed as Eq.(4), and the motivation of skip connection, i.e., '+h_{ij}', in Eq.(4)?

An additional question is in the code implementation, such as line 137, the \varphi is set to tanh, but in the paper, it is set to a sigmoid gating. Besides, in the paper, the graph-level embedding h is computed based on the original node features X, but in the code implementation, it is computed based on the attention-transformed features as line 151.

canqin001 commented 2 years ago

Hi,

Thanks for your interest of our work. The motivation of attention-based feature fusion is inspired by the previous GSC works [1][2]. The intuitive behind is such an attention model can better learn the residual knowledge necessary for inferring the final GSC score.

The activation function in Eq. (4) is applied for non-linearity. Both sigmoid and tanh can be possible depending on the dataset. There is not much difference between these two in most of cases. The graph-level embedding pooling can be referred in [1], where we directly applied in our paper.

[1] Simgnn: A neural network approach to fast graph similarity computation. WSDM [2] Learning-based efficient graph similarity computation via multi-scale convolutional set matching. AAAI

JhuoW commented 2 years ago

Thanks for explaining.