choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
11 stars 4 forks source link

Speed up preprocessing for dataset #122

Closed wiederm closed 4 months ago

wiederm commented 4 months ago

Description

There are two possible bottlenecks for the preprocessing pipeline:

In the following, I will outline some of the operations that I noticed are responsible for these bottlenecks. Not all of them can be easily mitigated, but there is room for further improvement.

Todos

Status

wiederm commented 4 months ago

Closing this since @chrisiacovella already prepared a PR with the same scope in #123