What Can We Do Better? - Githubissues

deepchem / moleculenet

Moleculenet.ai Datasets And Splits

MIT License

88 stars 19 forks source link

What Can We Do Better? #1

Open lilleswing opened 4 years ago

lilleswing commented 4 years ago

The Moleculenet publication has accomplished much in terms of having standardized problems for supervised learning over chemical structures. However over the past couple of years we have seen some barriers to entry in using the datasets. How can we make it easier?

This issue can be a brainstorming page for how to make the MoleculeNet datasets more accessible to Machine Learning Practitioners.

rbharath commented 4 years ago

Here's a couple of my observations so far:

There's a lot of scope to extend MoleculeNet into materials science applications. There's a lot of interest in fields such as electrolyte design for batteries where better benchmarks could help
There are a lot of new protein-ligand binding datasets available now that cryo-EM data is more available. We should expand out the collection of protein-ligand datasets
Perhaps new crystal structure datasets?

rbharath commented 4 years ago

We should make sure that there's a stable mechanism for splitting datasets that allows for easy benchmarking. This repo has some code that improves the stability (which was an issue in the original MoleculeNet):

https://github.com/shenwanxiang/ChemBench