choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

Adding model loaders for SPICE, ANI1x and ANI2x #65

Closed chrisiacovella closed 2 months ago

chrisiacovella commented 5 months ago

Curation routines for SPICE, QM9, and ANI1x are already in place; we still need to have a curation routine for ANI2x.

A dataset (that inherits from HDF5Dataset), has thus far only been set up for QM9; these need to be defined for the other datasets.

A few notes regarding SPICE:

Currently, we have curation schemes set up for both SPICE 1.1.4 (i.e., the data associated with the paper) and what I've just called "openff" spice. Openff spice is effectively same data as in 1.1.4, but calculated at the openff level of theory and retrieved from QCArchive.

Since openff spice is being retrieved from QCArchive, we can easily associated the "source" with each entry (e.g., "SPICE PubChem Set 1 Single Points Dataset v1.2"). This may be very useful for future testing purposes to be able to easily filter out subsets of data.

The HDF5 file of version 1.1.4 for SPICE includes filtering out of configurations with very high forces. It might be good to also identify those molecules in the openff_spice; we might not want to completely exclude them but rather provide an attribute in the hdf5 file to allow us to remove them if desired

We will also add to openff spice curation a quantity "DFT_total_force" rather than gradient (so we don't have to change the sign of this at training, and ensure that gradient is not accidentally used in place of force).