Closed svenvanderburg closed 2 years ago
This seems to be happening during the clustering step.
Maybe this is not the reason for the error, but edge_features
should be None
and not an empty list (see here)
For some reason the resulting torch_geometric dataset has a an empty tensor for:
Then pool_edge
fails with above error message:
https://github.com/DeepRank/deeprank-gnn-2/blob/d393f5411cfd68a7bd1f6082efc5344f041456c7/deeprank_gnn/community_pooling.py#L206
I'm guessing something is wrong with the dataset that we don't catch properly. We do check here: https://github.com/DeepRank/deeprank-gnn-2/blob/d393f5411cfd68a7bd1f6082efc5344f041456c7/deeprank_gnn/community_pooling.py#L190 whether we have internal edges, but we don't check if it is an empty tensor. I have no clue in which case we can expect this to be an empty tensor.
@NicoRenaud @DTRademaker @cbaakman any clue?
I guess if we have only one cluster left there will be no edges anymore. I would check the number of clusters left
I created a tests that replicates this in #92.
The problem is that the dataset doesn't have internal edges defined, then we add empty tensors here (and in the lines below): https://github.com/DeepRank/deeprank-gnn-2/blob/f78f01fefaa9f30e72c119b174e09f56fb2c4bee/deeprank_gnn/DataSet.py#L331
We check whether the internal edge dataset/key exist here: https://github.com/DeepRank/deeprank-gnn-2/blob/d393f5411cfd68a7bd1f6082efc5344f041456c7/deeprank_gnn/community_pooling.py#L190
But we don't check whether the tensor is empty. And apparently this chokes pool_edge
.
I tried checking whether the tensor is not empty:
has_internal_edges = hasattr(data, "internal_edge_index") and data['internal_edge_index'].shape[1] > 0
But then I just later get:
tests/test_nn.py:37: in _model_base_test
nn = NeuralNet(
deeprank_gnn/NeuralNet.py:117: in __init__
self.load_model(dataset, Net, dataset_eval)
deeprank_gnn/NeuralNet.py:168: in load_model
PreCluster(dataset, method=self.cluster_nodes)
deeprank_gnn/DataSet.py:92: in PreCluster
data.internal_edge_index, data.num_nodes, method=method
../../miniforge3/envs/3dvac3.9/lib/python3.9/site-packages/torch_geometric/data/data.py:362: in __getattr__
return getattr(self._store, key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = {'x': tensor([[12.0000, 0.0000, 3.1585, ..., -4.0000, -3.0000, 6.0000],
[15.0000, 0.0000, 63.0627, ..., ...e+00, -8.4491e+00],
[ 8.7118e+00, 5.7578e+00, -1.3459e+01],
[-4.9571e+00, 1.0465e+01, -1.1262e+01]])}
key = 'internal_edge_index'
def __getattr__(self, key: str) -> Any:
if key == '_mapping':
self._mapping = {}
return self._mapping
try:
return self[key]
except KeyError:
> raise AttributeError(
f"'{self.__class__.__name__}' object has no attribute '{key}'")
E AttributeError: 'GlobalStorage' object has no attribute 'internal_edge_index'
../../miniforge3/envs/3dvac3.9/lib/python3.9/site-packages/torch_geometric/data/storage.py:52: AttributeError
It seems like the support for graphs without internal edges is just completely broken, since we add empty tensors in case they don't exist, but then the code chokes on this.
@cbaakman you seemed to have changed some code related to this recently, any ideas on fixing this?
In the last class diagram, we merged the "edges" and "internal edges" collections into "edges". We could let the clustering software use "edges" instead.
OK, so the whole training code still needs to updated for the fact that the internal edges are now just merged with edges? Did you already think about how to change that?
OK, so the whole training code still needs to updated for the fact that the internal edges are now just merged with edges? Did you already think about how to change that?
I suppose we change this function, to use the edge_index and edge_attr instead. https://github.com/DeepRank/deeprank-gnn-2/blob/f78f01fefaa9f30e72c119b174e09f56fb2c4bee/deeprank_gnn/community_pooling.py#L159
OK, and that's it? For the rest the internal edges are not needed? And will it still make sense from deep learning perspective?
OK, and that's it? For the rest the internal edges are not needed? And will it still make sense from deep learning perspective?
You'd be clustering based on proximity between nodes. It would make less sense than clustering only on internal edges, which usually represent covalent bonds. If clustering is this important then we do maybe need to use only the edges with the covalent feature set to True.
When running NeuralNet.train(), @DTRademaker gets:
The code to reproduce this:
(I am still working on getting the actual dataset)