dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.46k stars 3.01k forks source link

Allow HeteroGraphConv to be applied on a subset of the etypes in a graph. #3003

Closed NiklasBeierl closed 3 years ago

NiklasBeierl commented 3 years ago

🐛 Bug

Right now, applying HeteroGraphConv to a graph while only having created modules for a subset of the graphs etypes causes a KeyError to be raised. For the torch model the corresponding lines are here: https://github.com/dmlc/dgl/blob/5be937a7fbfca0db08a1507744906d61e47a340b/python/dgl/nn/pytorch/hetero.py#L176 and line 189 below. To me that seems like an unnecessary restriction, making experimenting with heterogeneous graphs unnecessarily complicated if you want to ignore certain edges for whatever reason.

You can work around it by calling: graph.remove_edges(graph.edge_ids(*graph.edges(etype="ignore_me"), etype="ignore_me"), etype="ignore_me") ~If you attempt this workaround please not that remove_edges is a bit of a troublemaker on batched graphs as of writing this. See: https://github.com/dmlc/dgl/issues/2310#issuecomment-858456712~

To Reproduce

Steps to reproduce the behavior:

  1. Create a heterograph with at least two etypes.
  2. Create a HeteroGraphConv module with modules for only a subset of the etypes.
  3. Try to apply the module -> :boom:

Expected behavior

As far as I understand the HeteroGraphConv should just ignore the edges for which no module exists. Speaking in terms of code, adding:

if etype not in self.mods:
   continue

below this line and this line should fix it in the torch implementation and the code for the other implementations looks reasonably similar.

Environment

mufeili commented 3 years ago

I agree it's more reasonable to iterate over the modules in self.mods instead. Would you like to open a PR to fix this?

NiklasBeierl commented 3 years ago

I agree it's more reasonable to iterate over the modules in self.mods instead. Would you like to open a PR to fix this?

I'd love to. Sorry for the late reply, had to finish a thesis.

But first we need to clarify something:

Right now, the model works as long as there exists a module in the model for ever etype in the graph. Note that the model also "works" (does not raise an exception) if there is a module for an etype, but the etype is NOT in the graph.

Solution 1

With the two lines of code I suggested above the model would be working with any combination of etypes in the graph and etypes in the model. It would perform convolution on the intersection of the graphs etypes and the models etypes.

Solution 2

Now when you are suggesting to iterate over self.mods I think you want the model to assume the graph has etypes for every module in the model? That is imo more correct than my proposed fix, because it will fail if the etypes in the graph are a real subset of the etypes in the model. (The user defined a module but didn't provide corresponding data.) Again, this seems correct to me, but we have to be aware that it might brake some previously "working" code.

What do you think? Should I implement 1 or 2?

mufeili commented 3 years ago

I agree it's more reasonable to iterate over the modules in self.mods instead. Would you like to open a PR to fix this?

I'd love to. Sorry for the late reply, had to finish a thesis.

But first we need to clarify something:

Right now, the model works as long as there exists a module in the model for ever etype in the graph. Note that the model also "works" (does not raise an exception) if there is a module for an etype, but the etype is NOT in the graph.

Solution 1

With the two lines of code I suggested above the model would be working with any combination of etypes in the graph and etypes in the model. It would perform convolution on the intersection of the graphs etypes and the models etypes.

Solution 2

Now when you are suggesting to iterate over self.mods I think you want the model to assume the graph has etypes for every module in the model? That is imo more correct than my proposed fix, because it will fail if the etypes in the graph are a real subset of the etypes in the model. (The user defined a module but didn't provide corresponding data.) Again, this seems correct to me, but we have to be aware that it might brake some previously "working" code.

What do you think? Should I implement 1 or 2?

Nice catch. The caveat is that solution 2 will break backward compatibility. What do you think? @BarclayII @jermainewang

jermainewang commented 3 years ago

I think the edge_type_subgraph API may be what you should be looking for. It allows you to get a subgraph that contains only the specified relations. You can then apply HeteroGraphConv on it.

I am also worried that the change may complicate HeteroGraphConv. For example, this may cause some node types to not have any updated features, in which whether to generate some zero-value node features is an extra decision to make.

NiklasBeierl commented 3 years ago

I think the edge_type_subgraph API may be what you should be looking for. It allows you to get a subgraph that contains only the specified relations. You can then apply HeteroGraphConv on it.

Oh thanks, I was not aware of that one. Do you happen to know if that causes

I am also worried that the change may complicate HeteroGraphConv. For example, this may cause some node types to not have any updated features, in which whether to generate some zero-value node features is an extra decision to make.

Another nice catch. This is actually a strong argument against my suggestion. I am going to open a PR to add a note to the HeteroGraphConv for anyone else having my problem / use case.

yunshiuan commented 3 years ago

I think the edge_type_subgraph API may be what you should be looking for. It allows you to get a subgraph that contains only the specified relations. You can then apply HeteroGraphConv on it.

I am also worried that the change may complicate HeteroGraphConv. For example, this may cause some node types to not have any updated features, in which whether to generate some zero-value node features is an extra decision to make.

Thanks for the wonderful discussion! May I know more about what you meant by "this may cause some node types to not have any updated features"?

I ran into this exact problem but can't solve with edge_type_subgraph(). I was trying to predict the presence of a certain type of edge (type A) by the other edge types (type B, C, D). In this case, the type A edge should not occur in the input computation graph, and should not be processed by HeteroGraphConv(). If I used edge_type_subgraph(), the the edge edge I want to predict will also be incorrectly removed. This causes problems when I want to use EdgeDataLoader to sample positive graphs. Therefore, I am thinking about using the workaround that the OP proposed. Thanks!

NiklasBeierl commented 3 years ago

Hey @yunshiuan,

I think the edge case that @jermainewang is describing there is this: Imagine a node in the graph is only connected to the rest of the graph with an edge of one specific type. Here node 3 only has one incoming edge, which is of type B.

>>> import dgl
Using backend: pytorch
>>> node_data = {
... ("N", "A", "N"): ([1,1], [0,2]),
... ("N", "B","N"): ([2,2],[0,3])
... }
>>> graph = dgl.heterograph(node_data)

Now, if we ignore the B edges, Node 3 becomes a Node with degree 0. (It has no neighbors). If you now take a look at many of the Graph convolution modules in dgl.nn you will see that this is a problem for a lot of them. A search for zero_in_degree on this page will give you a good impression. It usually ends up being a divide-by-0 situation. c_ij in the GraphConv formula is a good example.

Now to your troubles with edge_type_subgraph(): I am not sure what you mean by the edge you want to predict being "incorrectly" removed. You mean that you need to keep it to later calculate your loss? I haven't done any link prediction yet. But shouldn't you be able to feed a copy of your graph without type A edges (obtained from edge_type_subgraph()) into the module while keeping your original graph stored in another variable to compare?

>>> without_a= graph.edge_type_subgraph(["B"]) # Use this for training
>>> without_a
Graph(num_nodes=4, num_edges=2,
      ndata_schemes={}
      edata_schemes={})
>>> graph # Stays the same, still has `B` edges in it
Graph(num_nodes={'N': 4},
      num_edges={('N', 'A', 'N'): 2, ('N', 'B', 'N'): 2},
      metagraph=[('N', 'N', 'A'), ('N', 'N', 'B')])

Now perhaps you still need to deal with 0-degree nodes. (edge_type_subgraph does not take care of that) You can allow them if your GConv module has a allow_zero_in_degree argument, or remove them like so.

yunshiuan commented 3 years ago

@NiklasBeierl Thank you for your detailed response! That clarifies my question about the problems with your fix. Regarding my troubles with edge_type_subgraph(), the solution you proposed works if I am feeding the entire graph into the GNN layers.

However, if I want to train the GNN with mini-batches using EdgeDataLoader (https://docs.dgl.ai/en/0.6.x/api/python/dgl.dataloading.html#dgl.dataloading.pytorch.EdgeDataLoader), your solution would not work. Note that EdgeDataLoader returns an iterator of input_nodes, pos_pair_graph, neg_pair_graph, blocks. For pos_pair_graph, it should include type A edges because they're the links I want to predict. But for blocks (the message flow graphs (MFGs)), it should NOT include type A edges because I want to exclude them from message passaging. Note that it's not possible to apply edge_type_subgraph() on MFGs, so there's no way I can get rid of type A edges from blocks when feeding them into the GNN.