awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
696 stars 144 forks source link

Discrepancy between graphs constructed from smiles_to_bigraph and from dgl.graph #223

Closed skystreet8 closed 7 months ago

skystreet8 commented 7 months ago

Hi everyone, I updated my DGL version from 1.0.0 to 1.1.2 and found a little bug when performing g.adj().coalesce() where g is a DGLGraph object constructed by the smiles_to_bigraph function:

from dgllife.utils import WeaveAtomFeaturizer, CanonicalBondFeaturizer, smiles_to_bigraph
smi = 'CCOCC'
atom_types = ['C', 'N', 'O', 'S', 'F', 'Si', 'P', 'Cl', 'Br', 'Mg', 'Na', 'Ca', 'Fe', 'As', 'Al', 'I', 'B', 'V', 'K', 'Tl', 'Yb', 'Sb', 'Sn', 'Ag', 'Pd', 'Co', 'Se', 'Ti', 'Zn', 'H', 'Li', 'Ge', 'Cu', 'Au', 'Ni', 'Cd', 'In', 'Mn', 'Zr', 'Cr', 'Pt', 'Hg', 'Pb', 'W', 'Ru', 'Nb', 'Re', 'Te', 'Rh', 'Ta', 'Tc', 'Ba', 'Bi', 'Hf', 'Mo', 'U', 'Sm', 'Os', 'Ir', 'Ce', 'Gd', 'Ga', 'Cs']
node_featurizer = WeaveAtomFeaturizer(atom_types=atom_types)
edge_featurizer = CanonicalBondFeaturizer(self_loop=True)
g = smiles_to_bigraph(smi, node_featurizer=node_featurizer, edge_featurizer=edge_featurizer, add_self_loop=True, canonical_atom_order=False)
g.adj().coalesce()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work02/home/guojs/miniconda3/envs/localretro-py39/lib/python3.9/site-packages/dgl/sparse/sparse_matrix.py", line 461, in coalesce
    return SparseMatrix(self.c_sparse_matrix.coalesce())
RuntimeError: expected scalar type Long but found Int

My DGLLife version is 0.3.2, and this version works fine with DGL 1.0.0.

I located the bug in the file utils\mol_to_graph.py, at line 153:

g = dgl.graph(([], []), idtype=torch.int32)

might should be:

g = dgl.graph(([], []), idtype=torch.int64)

at line 175:

g.add_edges(torch.IntTensor(src_list), torch.IntTensor(dst_list))

might should be:

g.add_edges(torch.LongTensor(src_list), torch.LongTensor(dst_list))

After applying the above changes the bug seems to be fixed. I haven't tested the changes with other versions of DGL. I encountered the same issue on both Win 11 and Ubuntu 18.04.2 platforms. My pytorch version is 1.13.1.

mufeili commented 7 months ago

Thank you for the report. A DGLGraph can use int32 or int64 for its data storage. It seems that .adj().coalesce() only supports int64. You can convert a DGLGraph from int32 to int64 with g.long().

skystreet8 commented 7 months ago

Thank you for the report. A DGLGraph can use int32 or int64 for its data storage. It seems that .adj().coalesce() only supports int64. You can convert a DGLGraph from int32 to int64 with g.long().

Thanks for your reply! It solved my issue perfectly!