Graph Structure Learning

KennyNH commented 1 year ago

Thanks for your wonderful work.

Why should the global emebddings be detached? https://github.com/d-ailin/GDN/blob/9853899da860682669a134e4af315d036aab4eca/models/GDN.py#LL145C10-L145C58

It seems that the graph structure dosen't receive any gradient information but only rely on the global embeddings. Is it unnecessary to pass gradient information to graph structure? or are there any difficulties doing that?

d-ailin commented 1 year ago

Thanks for the interest.

Though the graph structure (the adjacency matrix A) is defined as a binary matrix (after top-k), so it may not be able to fully back propagate through that. However, as the attention is using local and global embeddings, the attention part will receive gradient during training. I think it could also be interesting to consider defining A in a continuous manner as a weighted graph, in a way that allows backpropagation through A. But in our previous trials, directly enabling the propagation through A based on the current model architecture would make the training unstable (if I remember correctly). But I think it could be possible to design the graph structure A in a continuous manner after some other modifications.

KennyNH commented 1 year ago

Thanks for your reply.

d-ailin / GDN

Graph Structure Learning #70