Hi. I enjoyed this paper, but I have some concerns about the source data. It appears for the ESOL data that you used the same data as is in the Grover github repo. If you compare it to the original Delaney paper on aqueous solubility, you can see that the authors of Grover messed up and used the values that Delaney predicted from his own QSAR model as the labels, rather than experimental data (https://pubs.acs.org/doi/10.1021/ci034243x - see supporting information). Delaney's predictions are not the intended target for this task, it should be the measured values instead
Edit: This probably also explains why grover + rdkit descriptors does so well for this task in particular - some of the rdkit descriptors are the same as those used to make the predictions in the original paper.
Hi. I enjoyed this paper, but I have some concerns about the source data. It appears for the ESOL data that you used the same data as is in the Grover github repo. If you compare it to the original Delaney paper on aqueous solubility, you can see that the authors of Grover messed up and used the values that Delaney predicted from his own QSAR model as the labels, rather than experimental data (https://pubs.acs.org/doi/10.1021/ci034243x - see supporting information). Delaney's predictions are not the intended target for this task, it should be the measured values instead
Edit: This probably also explains why grover + rdkit descriptors does so well for this task in particular - some of the rdkit descriptors are the same as those used to make the predictions in the original paper.