AllenInstitute / bmtk

Brain Modeling Toolkit
https://alleninstitute.github.io/bmtk/
BSD 3-Clause "New" or "Revised" License
266 stars 86 forks source link

Problem importing bmtk.analyzer.compartment #327

Open moravveji opened 12 months ago

moravveji commented 12 months ago

I have pip installed BMTK version 1.0.8 on our HPC cluster, running on Rocky8 OS and with Intel Icelake CPUs. When I start an interactive job with 16 tasks, I fail to import the bmtk.analyzer.compartment package:

$ nproc
16
$ module use /apps/leuven/rocky8/icelake/2022b/modules/all
$ module load BMTK/1.0.8-foss-2022b
$ python
Python 3.10.8 (main, Jul 13 2023, 22:10:28) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bmtk
>>> import bmtk.analyzer.compartment
[m28c27n1:3237025] OPAL ERROR: Unreachable in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[m28c27n1:3237025] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

I have built BMTK/1.0.8-foss-2022b (and all its dependencies) against OpenMPI/4.1.4-GCC-12.2.0 module. However, this specific OpenMPI module is not built with Slurm support. That's why parallel applications which are launched using srun would spit out the OPAL error message above.

I would like to ask if there exists an environment variable to choose how the tasks would be launched? So that I can choose to use mpirun directly instead of srun.

kaeldai commented 11 months ago

Hi @moravveji, BMTK it-self does not directly call srun or mpirun. It uses standard mpi4py library which relies on your locally installed version of OpenMPI. We've ran large bmtk simulation using both Moab/Torque and Slurm, although how to actually execute them will be different for each cluster.

One thing to try is to create a python script and run directly from the prompt using mpirun (or mpiexec), so

$ mpirun -np 16 python my_bmtk_script.py

Unfortunately, whatever you do will no longer be interactive, and I don't think you can start-up a shell using mpirun (or alteast I've never seen it done before). If you're using Moab I think you can use the qsub -I option to get an interactive shell, but I haven't tried it myself.

Another option to try is using/compiling a different version of OpenMPI. If you access to anaconda, it might be worth creating a test environment and installing OpenMPI/MPICH2. I believe that when it installs it will try to find the appropriate workload manager options on the system, and if there is a slurm manager on your hpc, will install with PMI support. Although in my experience it doesn't always work, especially if slurm is installed in a non-standard way.

moravveji commented 11 months ago

Thanks @kaeldai for your comments. I can already share few thoughts based on our recent try-and-error tests:

So, the take home message is to avoid using bmtk in an interactive session (when OpenMPI is not compiled with PMI{2,x} support).