Open distillpub-reviewers opened 3 years ago
Thank you for the very detailed review! We are planning rewrites of a few sections + additional interactive visual descriptions addressing your comments. We are waiting for the other reviews before we finalize this, however.
The following peer review was solicited as part of the Distill review process.
The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.
Distill is grateful to Chaitanya K. Joshi for taking the time to review this article.
General Comments
"On this exposition, I felt that the main 'story' this article wants to tell is about how GNN research evolved from global spectral to local spatial graph convolutions. This is a super interesting topic, as someone who has also seen this evolution in the community.
Some areas that authors may improve upon or make this article much more stronger and convincing to the reader:
Detailed Remarks
The introduction is well written, but makes it seem like there was no ML on graphs before GNNs. Maybe worth mentioning graph kernels and random walk based methods, for e.g.
Comment on Figures upon browsing through entire article: most are static...which is okay. They are neat and well made. But distill does have a tradition for interactive figures, IMO...
The Challenges of Computation on Graphs
Maybe it's better to say 'in x section, we'll explore how the ...' because the current sentence made me click and jump over to the new section without reading anything before it. I am not sure if this is the intended user experience.
In this section, the concept of iteratively building the embeddings comes a bit out of the blue to me as a reader. Maybe before introducing notation, it will be helpful to say in plain english what a GNN does, e.g., GNNs iteratively build representations for each node in the graph (via xyz process).
Extending Convolutions to Graphs
The figure here could definitely do with interactivity! It would add to the coolness of the presentation if, e.g. I can see which neighbors contribute to the update of which node by hovering over it. Just a suggestion.
I would suggest reversing the order of these two points for more impact: ConvNets are great for images --> images are grid graphs --> convolution on graphs.
Spectral Convolutions
Disclaimer: I am not an expert on Spectral methods, so I am reading this section as someone who is trying to also learn about this topic from a fresh perspective. My comments are reflective of this.
It would be helpful to say what \hat x_i is here. I was confused for a moment. In general, this particular section does have many linear algebra terms being thrown at the reader, so I'd suggest either adding more footnotes to solidify and build intuitions/analogies or provide useful links.
This is an exciting statement and I think you want your reader to go like 'uh-huh' and nod their head upon reading this. Maybe it is useful to expand this a bit more or write it more clearly.
I suggest being more explicit: this gives us the adjacency matrix A and the graph laplacian L, allowing us...
I understand that h represents the node features and k is the k-th layer. But it will again be heplful to readers to be very explicit here. 'Finally, we can define the node feature vectors at the k-th iteration/layer for each node i as: ...'
Again, as a reader, the sudden mention of the filter is confusing. It would be good to introduce this idea that each layer/iteration k has associated with it a convolutional filter w^k.
Each h^k is a vector of real numbers (followed by mathematical notation).
Also, at this point, thinking as a reader who is very new to the mathematics, an obvious question in my mind is: why can't I just convolve/multiply the natural representations of the feature and weight? Why do I need to bother with the spectral representations?
Maybe it is good to explain this key idea here.
From Global to Local Convolutions
It may be interesting to mention why/how ChebNet also advances upon Spectral Net by enabling inductive learning. I believe this is a recent line of work by Ron Levie et al. worth looking into.
Modern Spatial Convolutions
Maybe the colors are too similar to eachother here?
Interactive Graph Neural Networks
Interesting visualization! I really enjoyed this one!
In particular, I was able to identify a key issue with some spatial GNNs through the visualization: oversmoothing.
E.g. for GCN and GAT, if one keeps pressing 'Update All Nodes', the node feature values eventually arrive to very similar numbers. This was interesting to me and maybe worth pointing to the reader, as there are several recent papers on this topic of oversmoothing (and building upon expressive power of GNNs).
Game of Life
To be honest, these experiments aren't very convincing to me...or, I don't know what I have learnt from reading through them. Sure, the goal was to compare if GNNs are as good as CNNs, but the results are not very 'satisfying' beyond saying that GNNs didn't learn the game and fail similar to CNNs.
Do you think it is possible to change the message passing scheme of GCN to something more powerful such as adding attention mechanisms or principled neighborhood aggregation functions? Essentially, Game of Life seems to require the model to be able to count neighborhoods very exactly, whereas GCNs do suffer from this... PNA was designed for counting neighborhoods and could genuinely be worth exploring here.
Thus, it would be much more convincing to the reader that GNNs are useful if you can demonstrate that this message passing framework is flexible to overcome the limitations of grid CNNs and enables the modeller to add problem-specific inductive biases. This would really reinforce all the theory we have been reading and get the reader excited about this new class of models.
Now, I also get why the authors may have chosen a non-conventional graph dataset -- Game of Life. All their examples are on grid graphs so they may have liked to continue the analogy. Another option could be to consider images such as classical MNIST or CIFAR10 -- these have recently been used to compare GNNs to CNNs, too. See work by Boris Knyazev et al. on this.
My comments from the Google Form
On diagrams, I liked the interactive diagrams a lot in how they helped me understand the concepts. Their graphic design may be iterated upon for better incorporating design best practices.
On writing and readability, there are several math heavy sections -- I feel that the authors could help readers by being very explicit when using a symbol defined many paragraphs ago, or provide more intuitive understanding of a couple mathematical concepts related to spectral convolutions.
On this exposition, I felt that the main 'story' this article wants to tell is about how GNN research evolved from global spectral to local spatial graph convolutions. This is a super interesting topic, as someone who has also seen this evolution in the community.
Some areas that authors may improve upon or make this article much more stronger and convincing to the reader:
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest What type of contributions does this article make?: Exposition on an emerging research direction
Comments on Readability
On diagrams, I liked the interactive diagrams a lot in how they helped me understand the concepts. Their graphic design may be iterated upon for better incorporating design best practices.
On writing and readability, there are several math heavy sections -- I feel that the authors could help readers by being very explicit when using a symbol defined many paragraphs ago, or provide more intuitive understanding of a couple mathematical concepts related to spectral convolutions.
Comments on Scientific Integrity
The notebooks to reproduce experiments are appreciated here!