layers shouldn't store the graph

CarloLucibello commented 3 years ago

The current design where layers store graph references is a bit odd, since the two should be decoupled. Other GNN libraries such as DGL and pytorch geometric pass the graph as an input to the layer in the forward pass.

I think the interface for the forward pass of any layer should consist in these two methods:

(m::GraphConv)(g::FeaturedGraph) = m(g, node_feature(g)) # or similar but return FeaturedGraph
(m::GraphConv)(g::FeaturedGraph, x) = ....

One then constructs a model as follows:

struct GNN
    conv1
    conv2 
    dense

    function GNN()
        new(GCNConv(1024=>512, relu),
            GCNConv(512=>128, relu), 
            Dense(128, 10))
    end
end

@functor GNN

function (net::GNN)(g, x)
    x = net.conv1(g, x)
    x = dropout(x, 0.5)
    x = net.conv2(g, x)
    x = net.dense(x)
    return x
end

net = GNN()

Storing the graph inside the layer is convenient when one wants to create a model using Flux.Chain, which gives the nice construction

net  = Chain(GCNConv(g, 1024=>512, relu),
              Dropout(0.5),
              GCNConv(g, 512=>128, relu),
              Dense(128, 10))

but this ties the model to a specific graph, which is something one generally doesn't want when working with GNNs. We don't have to necessarily renounce to a concise model constructor though, we can think of a GraphChain constructor that handles nicely the graph propagation (here is a PyG example)

yuehhua commented 3 years ago

Storing graph in layers is for the case which the same graph is being used through out the training process. Storing a static graph in layer avoids redundant converting graph structures or repeatedly computing Laplacian matrix for GCNConv. The static graph can be seen as a part of layer. If the graph is given in layer, it is a not learnable layer parameters.

CarloLucibello commented 3 years ago

It feels more natural to me to consider the graph as input data instead of part of the model. Also, layers shouldn't modify the internalized graph, it is unsafe since it affects other layers that internalized the same graph. But most importantly, there is a very strong argument for removing the graph from layers: now that FeaturedGraph is a @functor (as it should be so that it can be sent to gpu) when collecting params as in

ps = Flux.params(model)

one also collects the graphs internal arrays, and updates them when doing Flux.Optimise.update!(opt, ps, gs) which is clearly wrong.

yuehhua commented 3 years ago

For example, GCNConv needs a normalized Laplacian matrix to work with. Usually, I don't expect user to give a normalized Laplacian matrix to a layer. Users usually give an adjacency matrix or many other form of a graph. So, the storage of a normalized Laplacian can avoid the re-conversion of an adjacency matrix to a normalized Laplacian matrix. Once the matrix is computed, the matrix remain fixed in the layer. The stored matrix should still remain fixed during the training process. In this viewpoint, a layer digests the input graph and the layer is specialized by fusing with the graph. Accordingly, a GCNConv layer has two behaviors: one is training upon a static graph, which is stored in a specialized layer, and the other one is training upon variable graphs, which is given by training datasets.

FluxML / GeometricFlux.jl

layers shouldn't store the graph #199