choderalab / espaloma

Extensible Surrogate Potential of Ab initio Learned and Optimized by Message-passing Algorithm 🍹https://arxiv.org/abs/2010.01196
https://docs.espaloma.org/en/latest/
MIT License
202 stars 23 forks source link

Availability of benchmarking datasets from espaloma 0.3.0 #180

Open LeifSeute opened 11 months ago

LeifSeute commented 11 months ago

Hello there!

I would like to reproduce the results from table 1 of the espaloma 0.3.0 paper. image

Is there a way to obtain the datasets used for creating this table including bonded and nonbonded energies stored for the respective classical forcefields or directly as espaloma graphs? If one loads the data from spice or QC archive, it cannot be parametrized with amberff14sb since the information on residues is missing. For the other forcefields, one has to re-calculate the partial charges in this case.

mikemhenry commented 11 months ago

@LeifSeute Thank you for raising this issue! I will defer to @yuanqing-wang and @kntkb to answer this one

kntkb commented 11 months ago

@LeifSeute Thank you for your interest. A pre-filtered dataset ready for training and more information can be found here.

LeifSeute commented 10 months ago

Thank you for your answer. Unfortunately, I can only find scripts to download data that does not include the nonbonded contribution to the energies and gradients calculated from gaff-2.11 and openff-2.0.0, which are needed to add them to the bonded contributions predicted by espaloma. For a part of the dataset, I re-calculated them myself, however, this is relatively comp. expensive and I think that this is not economical since these calculations must have been done already to obtain the table referenced above.

Could you provide the full dataset (containing nonbonded energies from said classical force fields) for download, e.g. as hdf5 file like it is the case for the spice dataset (https://zenodo.org/record/7258940)?