choderalab / espaloma

Extensible Surrogate Potential of Ab initio Learned and Optimized by Message-passing Algorithm 🍹https://arxiv.org/abs/2010.01196
https://docs.espaloma.org/en/latest/
MIT License
212 stars 23 forks source link

[Question] Datasets for reproduction of the QM results in the paper #136

Closed LeifSeute closed 1 year ago

LeifSeute commented 1 year ago

*Edit: This has been resolved, it was my bad, see comment below.

Hello,

I am trying to reproduce the QM-fitting results from the table in figure 4a) from the paper (https://arxiv.org/abs/2010.01196). There is also information on the dataset in the table, e.g. that the PepConf set that was used has 736 molecules and 22154 snapshots in total. It is stated that the datasets can be obtained from QCArchive by filtering out snapshots with energies more than 0.1 Hartree higher than the minima.

However, the OptimizationDataset "OpenFF PEPCONF OptimizationDataset v1.0" that can be downloaded from QCArchive has 937 molecules with 50559 snapshots in total (50555 after filtering). And the PepConf dataset that is provided in the QM fitting tutorial (https://espaloma.wangyq.net/experiments/qm_fitting.html) has 631 molecules and 89073 snapshots in total. Thus I guess that another dataset was used to obtain the results from figure 4a).

How can the exact datasets that were used for QM-fitting in the paper be obtained?

Thanks!

espaloma_fig4a

LeifSeute commented 1 year ago

I am sorry, this has been resolved an can be deleted, I had downloaded a wrong dataset from QCArchive.

Thanks anyways!