ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
412 stars 155 forks source link

Support large dataset preprocessing #452

Open chiang-yuan opened 3 weeks ago

chiang-yuan commented 3 weeks ago

This PR tries to resolve OOM error and improve performance when loading very large dataset like MPTrj (1.58M) or even bigger ones. To use this file, mpi4py is needed.

Additional file preprocessing_data_mpi.py is added to ensure back compatibility and the refactoring is reduced to minimum, but ideally preprocessing_data.py could be replaced with the new file as long as we consider the import dependency on mpi4py

ilyes319 commented 1 week ago

Hey @chiang-yuan thank you. Is that ready to be merged?

chiang-yuan commented 1 week ago

it still needs some refactoring. It seems like only modifying the ase read part is not enough. I will refactor all the hdf5 file writing part as well but it might take sometime...