choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

Extending and updating curation sets. #74

Closed chrisiacovella closed 3 months ago

chrisiacovella commented 4 months ago

Description

This will add new datasets into model forge, including full spice 1.1.4, preliminary spice 2, ani2x, and the test dataset.

Notes:

Todos

Notable points that this PR has either accomplished or will accomplish.

Status

codecov-commenter commented 4 months ago

Codecov Report

Merging #74 (41f6653) into main (342c5ed) will increase coverage by 8.90%. The diff coverage is 94.14%.

Additional details and impacted files
chrisiacovella commented 4 months ago

The wiki has been updated with a lot of examples and discussion about the hdf5 file format and underlying "data" datastructure passed to the hdf5 file.

https://github.com/choderalab/modelforge/wiki/Dataset-and-curation

chrisiacovella commented 3 months ago

I still need to implement the test data set. I had to rerun some calculations.

chrisiacovella commented 3 months ago

There appears to be another change in the naming scheme in one of the datasets (Processing SPICE DES370K Single Points Dataset); I need to add in some regex searching to identify this different naming convention and skip all the sorting by conformers ids.