Xi-yuanWang / GLASS

GLASS: GNN with Labeling Tricks for Subgraph Representation Learning
29 stars 6 forks source link

Query about GlassConv Layer #2

Closed shhs29 closed 2 years ago

shhs29 commented 2 years ago

Hi,

I was taking a look at the implementation and saw the use of GlassConv Layers in the model. Could you help me in understanding what this layer is used for and what the purpose of the trans and comb functions in this layer are ?

Thanks a lot in advance, Shweta Ann Jacob

Xi-yuanWang commented 2 years ago

Dear Shweta,

GLASS assigns nodes zero-one label. A straightforward way to use zero-one label is to add it to node features, and process node features with ordinary message-passing layers. However, we find that directly using different parameters for nodes with zero and one label works better. Comb and trans are module lists whose [0] elements are for nodes with label 0, and [1] elements are for nodes with label 1.

Take the modules for label 0 as an example. With input node features x, trans[0] transforms x for nodes with label 0 (line 159). Then GLASSConv passes messages (line 164). Then we use an operation similar to residue connection: concatenating the input node feature x and the node embedding x (line 167). This operation leads to a higher dimension, so we use comb[0] to reduce the dimension (line 170, comb means combining x and x).

Modules for nodes with label 1 work in the same way: transforming node features(line 158), message passing(line 164), residual connection(line 167), and dimension reduction(169).

Moreover, we find that the number of label 0 nodes is much more than that of label 1 nodes, so the modules for label 1 are used less frequently and are not trained well. Therefore, we mix the output of two sets of modules (line 161) so these modules can be used with the same frequency.

Sincerely, Xiyuan Wang

shhs29 commented 2 years ago

Hi Xiyuan,

Thanks a lot for the quick reply!

I had a couple of other questions regarding this approach.

  1. Is there any reason why using separate parameters for nodes with zero and one label works better than using a single transformation layer ?
  2. Is there a good ratio for mixing the output of the two modules (z_ratio) ? I have not seen this idea implemented before, so I am wondering if there is any literature that you could point me towards to understand this better.
  3. Is there any advantage of using 2 separate combination layers instead of one ?

Kindly let me know if any of these questions need further clarification.

Thanks a lot in advance, Shweta Ann Jacob

Xi-yuanWang commented 2 years ago

Dear Shweta,

I think Q1 and Q3 are the same problems: why using separate parameters for nodes with zero and one label works better? Our heuristic is that using separate parameters makes our model more sensitive to labels. Let $x$ denote n-dimensional node feature, $l$ denote zero-one node label. If we use the same parameters, label 0 and label 1 functions are W @ torch.cat(($x$, $l$))+b = W[:, :n] @ x +b + $l$*W[:, n]. Thus the difference between the functional form for node 0 and node 1 is only a bias, while using separate parameters leads to completely different functions.

Q2: z_ratio is a hyperparameter. You can find the tuned value in config/*. This trick is wholly based on our observation.

Sincerely, Xiyuan Wang

shhs29 commented 2 years ago

Hi Xiyuan,

Thanks a lot for the detailed answer!

Closing this issue as it is resolved.