Graph-COM / GSAT

[ICML 2022] Graph Stochastic Attention (GSAT) for interpretable and generalizable graph learning.
https://arxiv.org/abs/2201.12987
MIT License
162 stars 21 forks source link

Infoloss before or after sampling #10

Closed simoons95 closed 1 year ago

simoons95 commented 1 year ago

Hello, me again !

I read in your paper that your infoloss should be based on the distribution of the subgraphs knowing the original graph and the parameters.

However, in your code, in order, you 1) compute this distribution in logits, 2) sample with a gumbel-softmax trick, and 3) apply the infoloss on the sampled subgraph. From my understanding, you should rather 1) compute the distribution in logits, 2) transform the logits into probabilities, using the same temperature as in the gumbel-softmax code, 3) apply the infoloss on that distribution, and 4) do your gumbel-softmax trick on the logits to be used in other parts of the code.

Mathematically, I think what you do bring a lot of noise in the infoloss back-propagated gradients, and I would expect the loss to be more efficient and clean if you follow the order I propose. That is, apply the infoloss on (att_log_logits / temp).sigmoid() (with temp set to 1 in your code) rather than on self.sampling(att_log_logits, epoch, training).

What do you think? Have I missed something? I would love to read your opinion on the matter.

ps: Thanks again for your paper and your reactivity to my previous issues!

siqim commented 1 year ago

Hi again!

This is another very good question! We did it intentionally in the paper of GSAT, and we explained a little bit here. What you suggest is mathematically correct, and our implementation was more like an empirical choice for more regularization. If I remember correctly, I found that using $\alpha$ in the info loss yielded better performance on spurious-motif datasets.

But in our follow-up work, LRI, we did more experiments on more realistic datasets, and we find that it seems using $\alpha$ or $p$ in the info loss do not have significant changes, and we adopt the mathematically correct way to implement our follow-up work, i.e., here.

Thanks again for your suggestions! I guess I need to add some doc in the code to make this point clear in the implementation of GSAT :)

Best, Siqi

simoons95 commented 1 year ago

Indeed, I missed the information. Thank you for your fast and clear answer!