deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

Using RD-Filters on MoleculeNet datasets #23

Open rbharath opened 3 years ago

rbharath commented 3 years ago

A number of the MoleculeNet datasets have PAINS compounds and other compounds that detract from their usefulness as benchmarking datasets. Let's use this issue to brainstorm the set of datasets that we want to filter. I think we can use @PatWalters https://github.com/PatWalters/rd_filters library to help us filter datasets.

Off the top of my head, I think we can start by applying the PAINS filters to the

  1. Chembl
  2. Chemb25
  3. HIV
  4. PCBA

datasets. We should discuss here to see if this makes sense though.

CC @mufeili @PatWalters @lilleswing @peastman