dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.18k stars 2.99k forks source link

The assignment behavior of DGLGraph.dstdata is wrong in some cases #7452

Open yfismine opened 4 weeks ago

yfismine commented 4 weeks ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

import dgl
from openhgnn.dataset.gtn_dataset import ACM4GTNDataset

graph = dgl.to_block(ACM4GTNDataset()[0])
print(graph)
print(graph.dstdata["label"])
graph.dstdata["label_copy"] = graph.dstdata["label"]
print(graph.dstdata["label_copy"])
print(graph.srcdata["label_copy"])
graph.dstnodes["paper"].data["label_copy"] = next(iter(graph.dstdata["label"].values()))
print(graph.dstdata["label_copy"])
image

Expected behavior

You can see that the attribute directly assigned to dstdata can't be found by querying dstdata. On the contrary, it is amazing to find this attribute in srcdata. I checked the code of HeteroNodeDataView and found that, The ntid used by self._graph._set_n_repr is through self._graph.get_ntype_id(ntype), but in get_ntypeid, ntid = self. srctypes invmap.get (ntype, Self. dsttypes _ invmap.get (ntype, none)), that is to say, the obtained ntid gives priority to srctypes, so this bug will be triggered if ntype appears in both the source node and the destination node.

Environment

frozenbugs commented 3 weeks ago

What's your use case of using to_block()?

yfismine commented 3 weeks ago

What's your use case of using to_block()?

DGLGraph image

frozenbugs commented 2 weeks ago

to_block is to create a bipartite-structured block for message passing. It is more an internal util method than public api of dgl, we expose it just in case users want to experiment with it. Can you be more specific about why you need to use to_block to create a bipartite-structured block? Can DGLGraph fit your need?

yfismine commented 2 weeks ago

to_block is to create a bipartite-structured block for message passing. It is more an internal util method than public api of dgl, we expose it just in case users want to experiment with it. Can you be more specific about why you need to use to_block to create a bipartite-structured block? Can DGLGraph fit your need?

It is found that the assignment behavior here is normal if toblock is not used. I encountered this problem when I used dist dgl for neighbor sampling and found that the generated block has no attributes. I need to pull data through DistGraph in a way like "batch labels = g. ndata ["labels "] [seeds]. long (). To (device)". I hope to unify the training codes of stand-alone and distributed. I re-implement the sampler_block of neighborSampler and assign attributes to the block in it.

image