amazon-science / disttgl

Apache License 2.0
12 stars 4 forks source link

Question regarding to static node memory #3

Open jasperzhong opened 9 months ago

jasperzhong commented 9 months ago

I couldn't find the code for generating static node memory. In the paper, it mentions that "we use learnable node embeddings pre-trained with the same task as the static node memory due to its simplicity" and "We pre-train the static node history with the same GNN architecture but only with static information using DGL [19]."

What does "static information" mean? Does it mean training without timestamps? If so, how can we use the same GNN architecture like TGN? And since "all graph events can be used to supervise the training process," why it does not lead to information leak?

tedzhouhk commented 7 months ago

Hi Jasper, for the static node memory, we simply use a learned embedding table to replace the node memory, pretrained with the same task on the same architecture. We release the pre-trained embedding table in learned_node_feats.pt in each dataset folder. Note that during the pre-training, we use the same train/val split. One might argue that future information in the training set is leaked (through the learned embedding table). However, I think this is usually not a concern, as 1. learned model weights also behave similarly, and 2. nothing in the test set is leaked. Thanks.