adrn / schwimmbad

A common interface to processing pools.
MIT License
115 stars 18 forks source link

How to use MPIPool in slurm? #34

Closed IncubatorShokuhou closed 2 months ago

IncubatorShokuhou commented 3 years ago

I tried to use MPIPool in slurm but some errors occours:

Traceback (most recent call last):
  File "mpi_test.py", line 12, in <module>
    with MPIPool() as p:
  File "/home/nfs/admin0/miniconda3/lib/python3.8/site-packages/schwimmbad/mpi.py", line 89, in __init__
    raise ValueError("Tried to create an MPI pool, but there "
ValueError: Tried to create an MPI pool, but there was only one MPI process available. Need at least two.

Here is my python script:

import sys
from schwimmbad import MPIPool
import time
import random

def worker(task):
    time.sleep(1)
    return random.random()

if __name__ == "__main__":
    with MPIPool() as p:
        result = p.map(worker,range(100))
    print(result)

and here is my slurm script:

#!/bin/bash   
#SBATCH -J mpi_test      
#SBATCH -n 20         
#SBATCH -p cpu40   
#SBATCH -o mpi_test.out        # Output file name   

mpiexec -n 20 python mpi_test.py

Could anyone give me some suggestions? Is there something wrong with my python or slurm script?

IncubatorShokuhou commented 3 years ago

BTW, nothing went wrong when I used python on bash.

AlecThomson commented 3 years ago

Hey @IncubatorShokuhou, I've had similar problems on a couple of different SC systems. I've found that exactly which MPI executable you call, and how mpi4py is installed makes a big difference.

For example, on one of my systems I need to:

# load in system mpi4py module
module load mpi4py 
# Use SRUN instead of mpirun or mpiexec
srun -n 20 python mpi_test.py

Hopefully this is helpful!

Edit:

Also, the mpi4py hello world is really useful for diagnosing these kinds of issues

IncubatorShokuhou commented 3 years ago

@AlecThomson Thank you for your suggestions, but seens that the error still occured. BTW, the mpi4py hello world works fine.

adrn commented 3 years ago

Hi @IncubatorShokuhou - Unfortunately it looks like it may be an issue with your MPI or mpi4py installation. I tried running this script on my laptop (Mac, openmpi installed via homebrew, mpi4py installed via pip) calling mpiexec directly from the terminal and it runs fine. I also tried running on our cluster, which is a linux cluster with slurm 20.02.5, openmpi 2.1.6, and mpi4py 3.7.3 and it also runs as expected.