awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
728 stars 151 forks source link

UnlabledSmiles errors on forward pass when passing valid single atom smiles #176

Open sudoPete opened 2 years ago

sudoPete commented 2 years ago

There appears to be an edge case in the UnlabeledSmiles data loader that fails to initialize the edges attribute for a single heavy atom smiles string. This causes a key error when trying to access the edge features.

      data = UnlabeledSMILES(  
             ['C'],  
             node_featurizer=CanonicalAtomFeaturizer(),  
             edge_featurizer=CanonicalBondFeaturizer(),  
             )  
     smiles, graphs = map(list, zip(*data))
     bg = dgl.batch(graphs)
     bg.set_n_initializer(dgl.init.zero_initializer)
     bg.set_e_initializer(dgl.init.zero_initializer)
     node_feats = bg.ndata.pop('h')
     edge_feats = bg.edata.pop('e')

_collections_abc.py", line 795, in pop value = self[key] dgl/view.py", line 181, in getitem return self._graph._get_e_repr(self._etid, self._edges)[key] KeyError: 'e'

mufeili commented 2 years ago

This is because that you have a single atom and there are no chemical bonds. As a result, the molecular graph has no edges, hence no edge features. For a workaround, try the solution below, which adds self loops.

from dgllife.data import UnlabeledSMILES
from dgllife.utils import mol_to_bigraph, CanonicalAtomFeaturizer, CanonicalBondFeaturizer
from functools import partial

data = UnlabeledSMILES(['C'], mol_to_graph=partial(mol_to_bigraph, add_self_loop=True), 
                                            node_featurizer=CanonicalAtomFeaturizer(), 
                                            edge_featurizer=CanonicalBondFeaturizer(self_loop=True))