Peer Review #1 - Githubissues

The following peer review was solicited as part of the Distill review process.

The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.

Distill is grateful to Chaitanya K. Joshi for taking the time to review this article.

General Comments

"On this exposition, I felt that the main 'story' this article wants to tell is about how GNN research evolved from global spectral to local spatial graph convolutions. This is a super interesting topic, as someone who has also seen this evolution in the community.

Some areas that authors may improve upon or make this article much more stronger and convincing to the reader:

More details on spatial GNNs: maybe something more on expressive power and neighborhood aggregation, e.g. GIN and PNA papers. Can we rank the 4 GNNs they've shown based on their expressive power? Can we show which models can distinguish which neighborhoods?
Then, building upon the previous point, it would be more convincing to the reader if the authors can highlight either the flexibility or advantages of GNNs over CNNs via their experiments with the Game of Life (or something else). See my detailed comments on this."

Detailed Remarks

One idea is to look at tools that have worked well in other domains: neural networks have shown immense predictive power in a variety of learning tasks.

The introduction is well written, but makes it seem like there was no ML on graphs before GNNs. Maybe worth mentioning graph kernels and random walk based methods, for e.g.

Comment on Figures upon browsing through entire article: most are static...which is okay. They are neat and well made. But distill does have a tradition for interactive figures, IMO...

The Challenges of Computation on Graphs

See Learning GNN Parameters for how the learnt embeddings can be used for these tasks!

Maybe it's better to say 'in x section, we'll explore how the ...' because the current sentence made me click and jump over to the new section without reading anything before it. I am not sure if this is the intended user experience.

Different GNN variants are distinguished by the way these representations are computed...In case the embeddings are computed iteratively, we use the notation h_v^{(k)}h v (k) to indicate the embedding after the k^{\text{th}}k th iteration. We can think of each iteration as the equivalent of a 'layer' in standard neural networks.

In this section, the concept of iteratively building the embeddings comes a bit out of the blue to me as a reader. Maybe before introducing notation, it will be helpful to say in plain english what a GNN does, e.g., GNNs iteratively build representations for each node in the graph (via xyz process).

Extending Convolutions to Graphs

The figure here could definitely do with interactivity! It would add to the coolness of the presentation if, e.g. I can see which neighbors contribute to the update of which node by hovering over it. Just a suggestion.

Noting that:

I would suggest reversing the order of these two points for more impact: ConvNets are great for images --> images are grid graphs --> convolution on graphs.

Spectral Convolutions

Disclaimer: I am not an expert on Spectral methods, so I am reading this section as someone who is trying to also learn about this topic from a fresh perspective. My comments are reflective of this.

As these nn eigenvectors form a basis for \mathbb{R}^nR n , any feature function xx can be represented as a linear combination of these eigenvectors:

It would be helpful to say what \hat x_i is here. I was confused for a moment. In general, this particular section does have many linear algebra terms being thrown at the reader, so I'd suggest either adding more footnotes to solidify and build intuitions/analogies or provide useful links.

We can represent any vector w \in \mathbb{R^n}w∈R n in the spectral basis: in particular, the weights of filters we want to convolve with!

This is an exciting statement and I think you want your reader to go like 'uh-huh' and nod their head upon reading this. Maybe it is useful to expand this a bit more or write it more clearly.

First, we fix an ordering of the nodes in GG. This gives us AA and LL, allowing us to compute U_mU m

I suggest being more explicit: this gives us the adjacency matrix A and the graph laplacian L, allowing us...

Finally we define h^k

I understand that h represents the node features and k is the k-th layer. But it will again be heplful to readers to be very explicit here. 'Finally, we can define the node feature vectors at the k-th iteration/layer for each node i as: ...'

Convert current filter \color{#177245} w^{(k)}w (k) to its spectral representation \color{#0047AB} \hat{w}^{(k)} w^(k).

Again, as a reader, the sudden mention of the filter is confusing. It would be good to introduce this idea that each layer/iteration k has associated with it a convolutional filter w^k.

each h^{(k)} \in \mathbb{R}^{d_k}h (k) ∈R d k

Each h^k is a vector of real numbers (followed by mathematical notation).

Also, at this point, thinking as a reader who is very new to the mathematics, an obvious question in my mind is: why can't I just convolve/multiply the natural representations of the feature and weight? Why do I need to bother with the spectral representations?

Maybe it is good to explain this key idea here.

From Global to Local Convolutions

The learned filters are specific to the input graphs. This is because they are represented in terms of the input graph Laplacian LL. This means they don't generalize to unseen graphs, preventing inductive learning! ChebNet [4] was one of the first approaches to deal with these difficulties... In conclusion, ChebNet allows us to learn localized filters, and learn them efficiently!

It may be interesting to mention why/how ChebNet also advances upon Spectral Net by enabling inductive learning. I believe this is a recent line of work by Ron Levie et al. worth looking into.

Modern Spatial Convolutions

Click on the different tabs below to see the definitions of the different models.

Maybe the colors are too similar to eachother here?

Interactive Graph Neural Networks

Interesting visualization! I really enjoyed this one!

In particular, I was able to identify a key issue with some spatial GNNs through the visualization: oversmoothing.

E.g. for GCN and GAT, if one keeps pressing 'Update All Nodes', the node feature values eventually arrive to very similar numbers. This was interesting to me and maybe worth pointing to the reader, as there are several recent papers on this topic of oversmoothing (and building upon expressive power of GNNs).

Game of Life

These experiments suggest that the GCN models don't learn the rules of the Game of Life in the form they are given above, minimizing the benefits of their inductive bias.

To be honest, these experiments aren't very convincing to me...or, I don't know what I have learnt from reading through them. Sure, the goal was to compare if GNNs are as good as CNNs, but the results are not very 'satisfying' beyond saying that GNNs didn't learn the game and fail similar to CNNs.

Do you think it is possible to change the message passing scheme of GCN to something more powerful such as adding attention mechanisms or principled neighborhood aggregation functions? Essentially, Game of Life seems to require the model to be able to count neighborhoods very exactly, whereas GCNs do suffer from this... PNA was designed for counting neighborhoods and could genuinely be worth exploring here.

Thus, it would be much more convincing to the reader that GNNs are useful if you can demonstrate that this message passing framework is flexible to overcome the limitations of grid CNNs and enables the modeller to add problem-specific inductive biases. This would really reinforce all the theory we have been reading and get the reader excited about this new class of models.

Now, I also get why the authors may have chosen a non-conventional graph dataset -- Game of Life. All their examples are on grid graphs so they may have liked to continue the analogy. Another option could be to consider images such as classical MNIST or CIFAR10 -- these have recently been used to compare GNNs to CNNs, too. See work by Boris Knyazev et al. on this.

My comments from the Google Form

On diagrams, I liked the interactive diagrams a lot in how they helped me understand the concepts. Their graphic design may be iterated upon for better incorporating design best practices.

On writing and readability, there are several math heavy sections -- I feel that the authors could help readers by being very explicit when using a symbol defined many paragraphs ago, or provide more intuitive understanding of a couple mathematical concepts related to spectral convolutions.

On this exposition, I felt that the main 'story' this article wants to tell is about how GNN research evolved from global spectral to local spatial graph convolutions. This is a super interesting topic, as someone who has also seen this evolution in the community.

Some areas that authors may improve upon or make this article much more stronger and convincing to the reader:

More details on spatial GNNs: maybe something more on expressive power and neighborhood aggregation, e.g. GIN and PNA papers. Can we rank the 4 GNNs they've shown based on their expressive power? Can we show which models can distinguish which neighborhoods?
Then, building upon the previous point, it would be more convincing to the reader if the authors can highlight either the flexibility or advantages of GNNs over CNNs via their experiments with the Game of Life (or something else). See my detailed comments.

Distill employs a reviewer worksheet as a help for reviewers.

The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.

Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest What type of contributions does this article make?: Exposition on an emerging research direction

Advancing the Dialogue	Score
How significant are these contributions?	4/5

Outstanding Communication	Score
Article Structure	3/5
Writing Style	3/5
Diagram & Interface Style	3/5
Impact of diagrams / interfaces / tools for thought?	4/5
Readability	3/5

Comments on Readability