Closed jermainewang closed 5 years ago
Can we use edges.g.data/nodes.g.data in UDF instead of another argument graph_data?
edges.g.data
should be the same shape as edges.data[...]
, with the corresponding gdata at each row.
The reason why we want to add gdata is mainly under the BatchedGraph case. Otherwise the graph data can be handled by user. Therefore non-ndarray makes no sense when we want to batch them. I'm for restricting gdata only to be ndarrays. We can have a suggested namespace for user to store the non-ndarray information, such as the corresponding SMILES for the molecule graph. However, DGL won't do anything for that field.
Initializer is a bit weird. Why we need initializer is because we might do partial update on node/edge's frame. It doesn't seems a case in graph-level data.
I see one case gdata could be helpful is something like Deep Graph Infomax, that user want interaction between graph-level data and node-level data, when having multiple graph batched together.
My two cents: I love the idea of graph level data and agree with all the four points made by @jermainewang . If we use gdata
in message-passing UDFs, I recommend we also design a set of builtin-operations: g_op_v
, g_op_u
, u_op_g
and v_op_g
and dedicated kernels for them.
As for compatibility, for now, we can add two sets of UDFs: gather
, broadcast
for graph level data and leave the interface of other UDFs unchanged;
For message/reduce functions, there are some cases gdata could be useful: we need to survey if such operations in existing papers could all be decomposed as: if so we do not need to let gdata appear in message/reduce functions.
gdata
is not only useful for DGI, considering node pooling operations, we can write most of them as a combination of g_op_v
/v_op_g
s.
Summary of discussion:
gdata
, the user could directly setattr
to a graph object.(bg, labels)
). The practice is standard.Overall, there is no advantage to use dedicated graph level data API except syntax concern at the moment. Thus we reject this RFC.
Following up on this, if we do something like [setattr(g, 'g_data_field', data) for g, data in zip(graphs, all_data)]
where each data
is a torch Tensor, could you elaborate how batching works? In other words if we do batched = dgl.batch(graphs)
, what happens to the g_data_field
? Are they automatically concatenated? If not, how can we access the individual g_data_field
of each graph from batched
?
EDIT: re-raised in #1449
Summary of discussion:
* Graph level data could be implemented without introducing new APIs. * The implementation effort depending on the specific scenario does not impose a significant difficulty. * To emulate the idea of `gdata`, the user could directly `setattr` to a graph object. * Batching graph level data is a simple concatenation. * When returning a batched graph and its accompany graph level data from a data loader, we suggest directly return a tuple (e.g., `(bg, labels)`). The practice is standard.
Overall, there is no advantage to use dedicated graph level data API except syntax concern at the moment. Thus we reject this RFC.
graph.local_scope()
seems to be not working in the approach above. Whas is an expected workaround for clearing the global data without the API?
Moved from #714 . Currently, we have node-level and edge-level data (e.g.
ndata
andedata
). The RFC proposes to add a dictionary for graph-level data (using torch syntax as an example):I could foresee following considerations
dgl.batch
needs to handle graph-level data too. The behavior should refer to how node/edge-level data are batched. For example, we need to throw errors if graph-level data under the same key cannot be stacked (or the key is missing for some graphs). Similar change needs to be made todgl.unbatch
.Compared with our previous suggestion to use function closure:
Also, the change will break backward compatibility. Any good idea?
Alternatives
Note that graph-level data could be maintained outside of DGLGraph because they are dense tensors and the most common operations are stacking/unstacking them. UDFs can also access them using function closure which is quite neat TBH. On the other hand, adding graph-level data needs change to the UDF signatures and potentially breaks user codes.