Open chiang-yuan opened 3 weeks ago
Hey @chiang-yuan thank you. Is that ready to be merged?
it still needs some refactoring. It seems like only modifying the ase read part is not enough. I will refactor all the hdf5 file writing part as well but it might take sometime...
This PR tries to resolve OOM error and improve performance when loading very large dataset like MPTrj (1.58M) or even bigger ones. To use this file,
mpi4py
is needed.Additional file
preprocessing_data_mpi.py
is added to ensure back compatibility and the refactoring is reduced to minimum, but ideallypreprocessing_data.py
could be replaced with the new file as long as we consider the import dependency onmpi4py