Problem running the code in mpi with External_Pk in Cray XC40 machine

ndb0 commented 3 years ago

I am trying to run Montepython with an external primordial power spectra code,(external_Pk) in a Cray XC40 machine. This external_Pk code is a fortran openmpi code .Without using mpi, the code runs smoothly but it can not use more than one nodes. Whenever I am trying to run it in mpi using more than one node, code crashes with errors. Cray machine has 24 cores per node. I was using 10 nodes , with 1 mpi process per node, and 23 openmpi threads for the fortran openmpi code which would be called inside each of these mpi processes. The fortran code is compiled in intel environment. The python code is running initially and calling the fortran code, fortran code writes output for first 10 mpi process, but then the python code is showing errors and the code stops.

My jobscript looks like:

setenv KMP_AFFINITY disabled setenv OMP_NUM_THREADS 23 setenv MPICH_MAX_THREAD_SAFETY multiple

aprun -j 1 -n 10 -N 1 -d 23 -cc depth python montepython/MontePython.py run -p input/baseGNMDC.param -o chains/lcdm07 -N 10 >> /mnt/lustre/phy3/phynilan/MP/mp/montepython_public/pbss/gnmdc.out

The error is coming like:

Fri Jun 4 02:25:43 2021: [PE_0]:inet_listen_socket_setup:inet_setup_listen_socket: bind failed port 1371 listen_sock = 32 Address already in use Fri Jun 4 02:25:43 2021: [PE_0]:_pmi_inet_listen_socket_setup:socket setup failed Fri Jun 4 02:25:43 2021: [PE_0]:_pmi_init:_pmi_inet_listen_socket_setup (full) returned -1 Fri Jun 4 02:25:44 2021: [PE_1]:inet_listen_socket_setup:inet_setup_listen_socket: bind failed port 1371 listen_sock = 16 Address already in use Fri Jun 4 02:25:44 2021: [PE_1]:_pmi_inet_listen_socket_setup:socket setup failed Fri Jun 4 02:25:44 2021: [PE_1]:_pmi_init:_pmi_inet_listen_socket_setup (full) returned -1

I am attaching the input parameter file, error file, job submission script and the list of loaded modules. It would be very helpful if you can suggest some resolution to this issue.

base_GNMDC.param.txt base_tt_only.e146400.txt modules_loaded.txt clcdm.pbs.txt

brinckmann commented 3 years ago

This looks like a cluster/MPI problem and those can sometimes be tricky to resolve. From the error message it looks like it might not be correctly distributing your mpi processes across different nodes. Do you have an IT staff you could contact for assistance? They might have a better chance to understand what's going on than we would have.

When I had a similar problem with a cluster using SLURM the solution was to very deliberately specify how many MPI processes per node and by specifying which type of MPI to use. That's hard to translate to a different system, but you can try to look through the help files to see if there's any more flags you can specify to be abundantly clear how you want your run distributed across the nodes and cores. E.g. I found (googling PBS and aprun, as well as aprun and mpi): https://pubs.cray.com/bundle/XC_Series_Programming_Environment_User_Guide_1705_S-2529/page/Using_aprun_with_PBS.html) https://pubs.cray.com/bundle/XC_Series_User_Application_Placement_Guide_CLE60UP01_S-2496/page/Run_Applications_Using_the_aprun_Command.html The latter has a reference to a -S flag that gives the number of PEs per NUMA node. That might be a direction to explore. But like I said, if you have an IT staff that can help that's probably your best shot outside of reading a lot of documentation yourself to understand how to correctly launch the job on this system.

Best, Thejs

ndb0 commented 3 years ago

Dear Thejs, Thanks for the quick response. I have already contacted IT experts, who are maintaining Cray in my institution. They are also not very sure of the issue.

The problem is only coming if I use an external code as the input primordial power spectra and run it in mpi chains, if I use the standard power law power spectra as the primordial scalar power spectra, then the code is running ok in mpi chains in cray, or if I use the external code but run montepython in single process without mpi then also the code is running fine.

So I am confused if the problem is with the distribution of tasks in cray nodes or it is coming because the mpi process starts before external_pk code executes. In the second case can I put a barrier and make the mpi process wait until the external_pk code finishes in all mpi processes?

Regards, Nilanjan

brinckmann commented 3 years ago

Hi Nilanjan,

Just to check, your external_pk code doesn't use MPI itself, right? Something like OMP should be fine.

Sometimes MPI screws up when launching in a directory that doesn't exist already. Can you try to first create the directory and the log.param, e.g. by running with -f 0?

I'm not really sure how to help with your specific problem, but you should know that MontePython doesn't actually benefit from running with MPI as run time should be similar when running in parallel. So if that works for you, maybe that's a good solution?

Best, Thejs

ndb0 commented 3 years ago

Hi Thejs, My external_pk code only uses openmpi. While running montepython with standard power law primordial spectra, I have also experienced the problem with a nonexisting directory, the problem disappears for an existing output directory with log.param file.

About the mpi run, I thought, though mpirun takes similar run time, while doing the analysis it offers a better result. But, as you suggested, in my case, probablyrunning without mpi is the only solution at this point.

Regards, Nilanjan

brinckmann commented 3 years ago

The analyze part is only (partially) parallelized through OMP, not through MPI, but it's also a minor part of the job. If you run parallel chains all of the chains will essentially take the leading role in computing covmats for update and/or jumping factors for superupdate (as opposed to only one chain when using MPI). I found that more or less any loss in efficiency from having more chains worry about computing covmats for the update algorithm is offset by more efficient iteration of it leading to a better covmat more quickly.

There should be no difference in the end result, the chains are doing the same thing with and without MPI aside from the behavior I noted above.

Best, Thejs

brinckmann / montepython_public

Problem running the code in mpi with External_Pk in Cray XC40 machine #222