choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

Generating the precomputed pairlist in parallel #153

Closed wiederm closed 2 weeks ago

wiederm commented 3 weeks ago

Description

The pairlist is pre-computed, which takes about 40 minutes for the ANI2x dataset (the largest dataset in the collection). We can use the DataLoader logic to prepare these in parallel on multiple CPUs.

This is unfortunately not as simple as I initially anticipated, since we are also removing self energies in the same pass. I will expand this PR with more information as soon as I have a better understanding of the DataLoader logic.

Todos

Notable points that this PR has either accomplished or will accomplish.

Questions

Status

chrisiacovella commented 3 weeks ago

As mentioned offline, I think we can use the data loader class to loop through the pair list calculation, rather than having to use multiprocessing.

wiederm commented 2 weeks ago

This PR has been superseded by PR #161