choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
11 stars 4 forks source link

Add different variations of PhAlkEthOH datasets #216

Open chrisiacovella opened 1 month ago

chrisiacovella commented 1 month ago

Currently the PhAlkEthOh dataset comes from the OpenFF optimization dataset and contains the entire optimization trajectory for each unique molecule.

It would be good to have a few additional variations of this dataset for exploring various aspects of the different NNPs and how data generation strategies impact efficacy. A few additional "versions" to add for the existing dataset:

Related, I will work on getting additional calculations going using the trajectories generated with GAFF.

chrisiacovella commented 1 month ago

This was mostly addressed in PR #245 . This PR removed configurations with high forces (about 1 hatree/bohr, just like done in spice). This also generates a test/full dataset that only contain the final energy minimized configuration, to make something that is very similar to qm9 (but with forces). This can serve as a baseline for also seeing importance of a few steps of optimization, MD generated configurations, etc.