deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

SIDER dataset: numbers of samples disagree #38

Open LanceKnight opened 3 years ago

LanceKnight commented 3 years ago

Hello,

The MoleculeNet documents state that SIDER dataset contains 1427 drugs. But the original SIDER paper said it had 1430 drugs. I cannot find the description for this discrepancy. Can some help me explain this? Thanks!

rbharath commented 3 years ago

Good question! If I had to guess offhand, perhaps rdkit has some processing errors on a few of the molecules? I'm not entirely sure