Closed NianzuMa closed 2 years ago
Let's describe it in a more clear way:
>>> import dgl
>>> import torch as th
>>> def func1(g):
... g.ndata['y'] = th.ones(3, 4)
... g.ndata['z'] = g.ndata['x'] + g.ndata['y']
...
>>> def func2(g):
... g = g.local_var()
... g.ndata['y'] = th.ones(3, 4)
... g.ndata['z'] = g.ndata['x'] + g.ndata['y']
...
>>> g = dgl.DGLGraph()
>>> g.add_nodes(3)
>>> g.ndata['x'] = th.zeros(3, 4)
>>> func2(g)
>>> g
DGLGraph(num_nodes=3, num_edges=0,
ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32)}
edata_schemes={})
>>> func1(g)
>>> g
DGLGraph(num_nodes=3, num_edges=0,
ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32), 'y': Scheme(shape=(4,), dtype=torch.float32), 'z': Scheme(shape=(4,), dtype=torch.float32)}
edata_schemes={})
func2
uses g.local_var()
, so the data frame of original graph have not been polluted.
func1
does not use local_var()
, and after calling this func the graph gets two extra attribute y
, z
that should only be used inside func1
internally.
We design this method because we hope the data frame of input graph not be polluted after calling into a gnn module.
Sorry I do not notice you mentioned inplace
, an inplace
operation is something like:
>>> import torch as th
>>> import dgl
>>> g = dgl.DGLGraph()
>>> g.add_nodes(5)
>>> g.ndata['x'] = th.rand(5, 3, requires_grad=True)
>>> g.ndata['x'][1] = th.rand(3) # THIS IS INPLACE OPERATION.
>>> y = g.ndata['x'].sum()
>>> y.backward()
>>> g.ndata['x'].grad
RuntimeError: leaf variable has been moved into the graph interior
that trying to assign values for a sub-tensor(such operation is not allowed in Tensorflow), which will break the computation graph in PyTorch.
Feel free to use local_var
in fine-tuning.
Thank you for your answer.
I tried something more there.
>>> import torch
>>>
>>>
>>> def func_1(g):
... g.ndata["y"] = torch.ones(3, 4)
... g.ndata["z"] = g.ndata["x"] + g.ndata["y"]
...
>>>
>>> def func_2(g):
... g = g.local_var()
... g.ndata["y"] = torch.ones(3, 4)
... g.ndata["z"] = g.ndata["x"] + g.ndata["y"]
...
>>> g = dgl.DGLGraph()
>>> g.add_nodes(3)
>>> g.ndata["x"] = torch.zeros(3, 4)
>>>
>>> func_2(g)
>>> print(g)
DGLGraph(num_nodes=3, num_edges=0,
ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32)}
edata_schemes={})
>>>
>>> func_1(g)
>>> print(g)
DGLGraph(num_nodes=3, num_edges=0,
ndata_schemes={'x': Scheme(shape=(4,), dtype=torch.float32), 'y': Scheme(shape=(4,), dtype=torch.float32), 'z': Scheme(shape=(4,), dtype=torch.float32)}
edata_schemes={})
>>> print(g.ndata["x"])
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> def func_3(g):
... g = g.local_var()
... g.ndata["x"][1] = torch.rand(1, 4) # this is inplace operation, even though use local_var(), still it will change the shared tensor values, so that the original graph is changed. This is a dangerous operation, that won't be protected by local_var()
...
>>>
>>> func_3(g)
>>> print(g.ndata["x"])
tensor([[0.0000, 0.0000, 0.0000, 0.0000],
[0.6031, 0.4591, 0.4806, 0.7303],
[0.0000, 0.0000, 0.0000, 0.0000]])
func_3 is the exact example for the statement in the document:
However, inplace operations do change the shared tensor values, so will be reflected to the original graph.
This is inplace operation, even though use local_var(), still it will change the shared tensor values, so that the original graph is changed. This is a dangerous operation, that won't be protected by local_var(). I think we should include this in the document. Thanks.
Another more specific question regarding GAT.
In the example of GAT, GAT is defined in this way:
class GAT(nn.Module):
def __init__(self,
g,
num_layers,
in_dim,
num_hidden,
num_classes,
heads,
activation,
feat_drop,
attn_drop,
negative_slope,
residual):
super(GAT, self).__init__()
self.g = g
self.num_layers = num_layers
self.gat_layers = nn.ModuleList()
self.activation = activation
# input projection (no residual)
self.gat_layers.append(GATConv(
in_dim, num_hidden, heads[0],
feat_drop, attn_drop, negative_slope, False, self.activation))
# hidden layers
for l in range(1, num_layers):
self.gat_layers.append(GATConv(
num_hidden * heads[l-1], num_hidden, heads[l],
feat_drop, attn_drop, negative_slope, residual, self.activation))
# output projection
self.gat_layers.append(GATConv(
num_hidden * heads[-2], num_classes, heads[-1],
feat_drop, attn_drop, negative_slope, residual, None))
def forward(self, inputs):
h = inputs
for l in range(self.num_layers):
h = self.gat_layers[l](self.g, h).flatten(1)
# output projection
logits = self.gat_layers[-1](self.g, h).mean(1)
return logits
In this way, to train cora dataset
args.dataset = "cora"
data = load_data(args)
features = torch.FloatTensor(data.features)
... # omit many line of code here.
logits = model(features)
Question (1): The node feature here is used statically without fine-tune right?
My idea of making the node feature to be fine-tuned: define an node embedding in GAT as below:
class GAT(nn.Module):
def __init__(self,
g,
num_layers,
in_dim,
num_hidden,
num_classes,
heads,
activation,
feat_drop,
attn_drop,
negative_slope,
residual):
super(GAT, self).__init__()
self.g = g
self.num_layers = num_layers
self.gat_layers = nn.ModuleList()
self.activation = activation
# input projection (no residual)
self.gat_layers.append(GATConv(
in_dim, num_hidden, heads[0],
feat_drop, attn_drop, negative_slope, False, self.activation))
# hidden layers
for l in range(1, num_layers):
self.gat_layers.append(GATConv(
num_hidden * heads[l-1], num_hidden, heads[l],
feat_drop, attn_drop, negative_slope, residual, self.activation))
# output projection
self.gat_layers.append(GATConv(
num_hidden * heads[-2], num_classes, heads[-1],
feat_drop, attn_drop, negative_slope, residual, None))
# -------------------------------------- check below node embedding ------------------------------------------
self.node_embedding = nn.Embedding(g.number_of_nodes(), args.node_embed_size)
# ... omit code here, initialize the embedding as the original node features
def forward(self, node_ids):
h = self.node_embedding(node_ids) # -------------------------------------- check here ----------------------------
for l in range(self.num_layers):
h = self.gat_layers[l](self.g, h).flatten(1)
# output projection
logits = self.gat_layers[-1](self.g, h).mean(1)
return logits
Question (2): If I do it in this way as above (check the # ----------- check ------ in the code), the feature of the node can be fine-tuning right? Thank you very much for your answer.
The answer to both of your questions is yes.
EDIT: The answer to the first question is yes.
The answer to the second question is, although the embedding will be fine-tuned, the code does not look right. For full-graph training, you can do something like:
class GAT(nn.Module):
def __init__(self,
g, ...):
super(GAT, self).__init__()
self.g = g
...
# -------------------------------------- check below node embedding ------------------------------------------
# change the initialization to anything you like
self.node_embedding = nn.Parameter(torch.randn(g.number_of_nodes(), args.node_embed_size))
# ... omit code here, initialize the embedding as the original node features
def forward(self):
h = self.node_embedding # -------------------------------------- check here ----------------------------
for l in range(self.num_layers):
h = self.gat_layers[l](self.g, h).flatten(1)
# output projection
logits = self.gat_layers[-1](self.g, h).mean(1)
return logits
If your graph cannot fit into GPU memory, you will need minibatch training and neighborhood sampling to do it correctly. You can probably refer to our sampling-based GAT on OGB-products in https://github.com/dmlc/dgl/blob/master/examples/pytorch/ogb/ogbn-products/gat/main.py.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.
📚 Documentation
It is an API reference part.
In the API documentation, local_var, I found this statement in the
Note
.Could you give an example of what
inplace operations
will change the shared tensor value? It is not clear to me and I do not know how this function can be safely used when writing a customized models.Additional Question: In the code of gatconv In forward function, it uses the
graph = graph.local_var()
. Does this mean that this implementation ofgatconv
will not tune the graph node features during training?For example, if I am using node features, I want to tune the feature during training, I need to implement another gatconv, rather than use the version in
from dgl.nn.pytorch.GATConv
Thank you very much for your answer.