POSYDON-code / POSYDON

POSYDON is a next-generation single and binary-star population synthesis code incorporating full stellar structure and evolution modeling with the use of MESA.
BSD 3-Clause "New" or "Revised" License
29 stars 19 forks source link

Remove MPI for running grids #150

Open mkruckow opened 1 year ago

mkruckow commented 1 year ago

Today, I had a long meeting with the people from the HPC group. They strongly encourage us to not use the python MPI, when we don't need it. They said, that in the current version, this module is not correctly initialized by us. It is even the case, that importing it while not initializing it correctly causes errors. The origin of the error is, that at some point slurm added to use MPI for the srun, hence there are two MPIs running at the same time, which causes conflicts, if they are not correctly initialized. For this PR https://github.com/POSYDON-code/POSYDON/pull/143, I'll need to disable the automatic import of MPI4py. @ka-rocha, I guess, you have been the person putting it in for the dynamic grid creation, could you have a look to to correctly initialize the MPI and or think of a way to not use MPI for the dynamic grid creation. E.g. I'd think about generating a new slurm file for each new job and instead of running it directly, sbatch the new job. If there is any data, which needs to be transported from one instance to another we would may need to create temporary files to store that data between two runs.

astroJeff commented 11 months ago

As we discussed during the developers' meeting today, the goal is to move away from MPI and to using job arrays for running populations.