DGLGraph.local_var documentation confusion

NianzuMa commented 4 years ago

📚 Documentation

It is an API reference part.

In the API documentation, local_var, I found this statement in the Note.

However, inplace operations do change the shared tensor values, so will be reflected to the original graph.

Could you give an example of what inplace operations will change the shared tensor value? It is not clear to me and I do not know how this function can be safely used when writing a customized models.

Additional Question: In the code of gatconv In forward function, it uses the graph = graph.local_var(). Does this mean that this implementation of gatconv will not tune the graph node features during training?

For example, if I am using node features, I want to tune the feature during training, I need to implement another gatconv, rather than use the version in from dgl.nn.pytorch.GATConv

Thank you very much for your answer.

yzh119 commented 4 years ago

Let's describe it in a more clear way:

>>> import dgl
>>> import torch as th
>>> def func1(g):
...     g.ndata['y'] = th.ones(3, 4)
...     g.ndata['z'] = g.ndata['x'] + g.ndata['y']
... 
>>> def func2(g):
...     g = g.local_var()
...     g.ndata['y'] = th.ones(3, 4)
...     g.ndata['z'] = g.ndata['x'] + g.ndata['y']
... 
>>> g = dgl.DGLGraph()
>>> g.add_nodes(3)
>>> g.ndata['x'] = th.zeros(3, 4)
>>> func2(g)
>>> g
DGLGraph(num_nodes=3, num_edges=0,
         ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32)}
         edata_schemes={})
>>> func1(g)
>>> g
DGLGraph(num_nodes=3, num_edges=0,
         ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32), 'y': Scheme(shape=(4,), dtype=torch.float32), 'z': Scheme(shape=(4,), dtype=torch.float32)}
         edata_schemes={})

func2 uses g.local_var(), so the data frame of original graph have not been polluted. func1 does not use local_var(), and after calling this func the graph gets two extra attribute y, z that should only be used inside func1 internally.

We design this method because we hope the data frame of input graph not be polluted after calling into a gnn module.

yzh119 commented 4 years ago

Sorry I do not notice you mentioned inplace, an inplace operation is something like:

>>> import torch as th
>>> import dgl
>>> g = dgl.DGLGraph()
>>> g.add_nodes(5)
>>> g.ndata['x'] = th.rand(5, 3, requires_grad=True)
>>> g.ndata['x'][1] = th.rand(3)  # THIS IS INPLACE OPERATION.
>>> y = g.ndata['x'].sum()
>>> y.backward()
>>> g.ndata['x'].grad
RuntimeError: leaf variable has been moved into the graph interior

that trying to assign values for a sub-tensor(such operation is not allowed in Tensorflow), which will break the computation graph in PyTorch.

Feel free to use local_var in fine-tuning.

NianzuMa commented 4 years ago

Thank you for your answer.

I tried something more there.

>>> import torch
>>> 
>>> 
>>> def func_1(g):
...     g.ndata["y"] = torch.ones(3, 4)
...     g.ndata["z"] = g.ndata["x"] + g.ndata["y"]
... 
>>> 
>>> def func_2(g):
...     g = g.local_var()
...     g.ndata["y"] = torch.ones(3, 4)
...     g.ndata["z"] = g.ndata["x"] + g.ndata["y"]
... 
>>> g = dgl.DGLGraph()
>>> g.add_nodes(3)
>>> g.ndata["x"] = torch.zeros(3, 4)
>>> 
>>> func_2(g)
>>> print(g)
DGLGraph(num_nodes=3, num_edges=0,
         ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32)}
         edata_schemes={})
>>> 
>>> func_1(g)
>>> print(g)
DGLGraph(num_nodes=3, num_edges=0,
         ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32), 'y': Scheme(shape=(4,), dtype=torch.float32), 'z': Scheme(shape=(4,), dtype=torch.float32)}
         edata_schemes={})
>>> print(g.ndata["x"])
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

>>> def func_3(g):
...     g = g.local_var()
...     g.ndata["x"][1] = torch.rand(1, 4)  # this is inplace operation, even though use local_var(), still it will change the shared tensor values, so that the original graph is changed. This is a dangerous operation, that won't be protected by local_var()
... 
>>> 
>>> func_3(g)
>>> print(g.ndata["x"])
tensor([[0.0000, 0.0000, 0.0000, 0.0000],
        [0.6031, 0.4591, 0.4806, 0.7303],
        [0.0000, 0.0000, 0.0000, 0.0000]])

func_3 is the exact example for the statement in the document:

However, inplace operations do change the shared tensor values, so will be reflected to the original graph.

This is inplace operation, even though use local_var(), still it will change the shared tensor values, so that the original graph is changed. This is a dangerous operation, that won't be protected by local_var(). I think we should include this in the document. Thanks.

NianzuMa commented 4 years ago

Another more specific question regarding GAT.

In the example of GAT, GAT is defined in this way:

class GAT(nn.Module):
    def __init__(self,
                 g,
                 num_layers,
                 in_dim,
                 num_hidden,
                 num_classes,
                 heads,
                 activation,
                 feat_drop,
                 attn_drop,
                 negative_slope,
                 residual):
        super(GAT, self).__init__()
        self.g = g
        self.num_layers = num_layers
        self.gat_layers = nn.ModuleList()
        self.activation = activation
        # input projection (no residual)
        self.gat_layers.append(GATConv(
            in_dim, num_hidden, heads[0],
            feat_drop, attn_drop, negative_slope, False, self.activation))
        # hidden layers
        for l in range(1, num_layers):
            self.gat_layers.append(GATConv(
                num_hidden * heads[l-1], num_hidden, heads[l],
                feat_drop, attn_drop, negative_slope, residual, self.activation))
        # output projection
        self.gat_layers.append(GATConv(
            num_hidden * heads[-2], num_classes, heads[-1],
            feat_drop, attn_drop, negative_slope, residual, None))

    def forward(self, inputs):
        h = inputs
        for l in range(self.num_layers):
            h = self.gat_layers[l](self.g, h).flatten(1)
        # output projection
        logits = self.gat_layers[-1](self.g, h).mean(1)
        return logits

In this way, to train cora dataset

args.dataset = "cora"
data = load_data(args)
features = torch.FloatTensor(data.features)
... # omit many line of code here.
logits = model(features)

Question (1): The node feature here is used statically without fine-tune right?

My idea of making the node feature to be fine-tuned: define an node embedding in GAT as below:

class GAT(nn.Module):
    def __init__(self,
                 g,
                 num_layers,
                 in_dim,
                 num_hidden,
                 num_classes,
                 heads,
                 activation,
                 feat_drop,
                 attn_drop,
                 negative_slope,
                 residual):
        super(GAT, self).__init__()
        self.g = g
        self.num_layers = num_layers
        self.gat_layers = nn.ModuleList()
        self.activation = activation
        # input projection (no residual)
        self.gat_layers.append(GATConv(
            in_dim, num_hidden, heads[0],
            feat_drop, attn_drop, negative_slope, False, self.activation))
        # hidden layers
        for l in range(1, num_layers):
            self.gat_layers.append(GATConv(
                num_hidden * heads[l-1], num_hidden, heads[l],
                feat_drop, attn_drop, negative_slope, residual, self.activation))
        # output projection
        self.gat_layers.append(GATConv(
            num_hidden * heads[-2], num_classes, heads[-1],
            feat_drop, attn_drop, negative_slope, residual, None))
        # -------------------------------------- check below  node embedding  ------------------------------------------
        self.node_embedding = nn.Embedding(g.number_of_nodes(), args.node_embed_size)
        # ... omit code here, initialize the embedding as the original node features

    def forward(self, node_ids):
        h = self.node_embedding(node_ids)  # -------------------------------------- check here ----------------------------
        for l in range(self.num_layers):
            h = self.gat_layers[l](self.g, h).flatten(1)
        # output projection
        logits = self.gat_layers[-1](self.g, h).mean(1)
        return logits

Question (2): If I do it in this way as above (check the # ----------- check ------ in the code), the feature of the node can be fine-tuning right? Thank you very much for your answer.

BarclayII commented 4 years ago

~~The answer to both of your questions is yes.~~

EDIT: The answer to the first question is yes.

The answer to the second question is, although the embedding will be fine-tuned, the code does not look right. For full-graph training, you can do something like:

class GAT(nn.Module):
    def __init__(self,
                 g, ...):
        super(GAT, self).__init__()
        self.g = g
        ...
        # -------------------------------------- check below  node embedding  ------------------------------------------
        # change the initialization to anything you like
        self.node_embedding = nn.Parameter(torch.randn(g.number_of_nodes(), args.node_embed_size))
        # ... omit code here, initialize the embedding as the original node features

    def forward(self):
        h = self.node_embedding  # -------------------------------------- check here ----------------------------
        for l in range(self.num_layers):
            h = self.gat_layers[l](self.g, h).flatten(1)
        # output projection
        logits = self.gat_layers[-1](self.g, h).mean(1)
        return logits

If your graph cannot fit into GPU memory, you will need minibatch training and neighborhood sampling to do it correctly. You can probably refer to our sampling-based GAT on OGB-products in https://github.com/dmlc/dgl/blob/master/examples/pytorch/ogb/ogbn-products/gat/main.py.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] commented 2 years ago

This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.

dmlc / dgl

DGLGraph.local_var documentation confusion #1743

📚 Documentation