SSAGESproject / SSAGES

Software Suite for Advanced General Ensemble Simulations
GNU General Public License v3.0
81 stars 28 forks source link

MPI issues #25

Open ghost opened 3 years ago

ghost commented 3 years ago

Hi! Let me start by thanking you for your code!

We have implemented a new CV that calculates the cvvalue in parallel. That was needed as the algorithm considers the contribution of every atom in the system. We have a local version of SSAGES that contains the new piece of code. When everything works, we would like to share it with you. The cvvalue is calculated correctly across the processors. Unfortunately, we noticed that when running FFS with LAMMPS the “dumpfiles” are written by SSAGES serially, so each processor overwrites the output. So, e.g., in a file "l0-n0.dat" we only have information about the atoms handled by one core, and we receive the following error error, could not locate atomID 1 from dumpfile. We would like to use SSAGES for a large MD calculations. Therefore, parallelisation is important to us. Have we missed anything?

We would really appreciate your help.

mquevill commented 3 years ago

You are correct that multiple MPI processes are trying to read/write at the same time, which corrupts the dumpfiles. Because of this and a few other quirks, the current implementation of FFS is limited to just 1 MPI process per walker. We hope to fix the full parallelization capabilities of FFS soon, but we do not have an explicit timeline for when that will be. See also https://github.com/SSAGESproject/SSAGES/issues/10#issuecomment-498678419.

As a current workaround to accelerate simulations, you could try utilizing OpenMP threads with the USER-OMP package. It is only the MPI processes per walker that are limited.

ghost commented 3 years ago

Thank you for your response.

mquevill commented 3 years ago

SSAGES v0.9.3 will now throw an error if the user tries to use more than 1 MPI process per walker. While we would like to get full MPI support for FFS, this should prevent the generation of corrupted data files.