OpenBioML / chemnlp

ChemNLP project
MIT License
148 stars 46 forks source link

New task: Add ESOL dataset #33

Open MicPie opened 1 year ago

MicPie commented 1 year ago

The “small” training data set is available in the supporting information: https://pubs.acs.org/doi/10.1021/ci034243x

I created an issue in the ESOL repo to ask for the full data: https://github.com/hossainlab/ESOL/issues/1

Worst case we can only use the small subset.

kjappelbaum commented 1 year ago

I have many of the tabular datasets here https://www.dropbox.com/sh/oqisd84vyt97z1i/AADxgPu_ESJBKlYpqLmtKwjya?dl=0

also the "solubility test set" proposed by Pat Walters in his blog post http://practicalcheminformatics.blogspot.com/2018/09/predicting-aqueous-solubility-its.html

kjappelbaum commented 1 year ago

For a large solubility dataset I'd look into AquasolDB

phalem commented 1 year ago

I found data that is ready to use here: https://www.kaggle.com/code/mmelahi/physical-chemistry-lipophilicity/data And data found here: https://ecbd.eu/

apoorvasrinivasan26 commented 1 year ago

is anyone working on this? if not, i'd be happy to!