UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Install request: mpi4py v3 #191

Closed nicolasgold closed 5 years ago

nicolasgold commented 6 years ago

Version 3 of mpi4py has at least one very useful feature (present in versions 2.1+). Invoking its mpi4py.run functionality thus: python -m mpi4py means that should one of the processes in the communicator abort, this will be caught and cause the job to fail, rather than deadlock waiting for communication and thus rely only on the time limit to kill it from the queue. This would presumably make better use of the cluster as failing jobs will finish earlier.

Info here: https://mpi4py.readthedocs.io/en/stable/mpi4py.run.html

heatherkellyucl commented 6 years ago

(Related to IN:03093371 - note: build with a new OpenMPI).

heatherkellyucl commented 6 years ago

To run successfully on Grace, have to turn off matched probes:

import mpi4py
mpi4py.rc.recv_mprobe = False
from mpi4py import MPI

I think fundamentally it's related to the underlying fabric and how it handles matched probes for messages. mpi4py defaults to matched mode when it detects an MPIv3 standards-compliant library so it would try to do this. From what I've read, some fabrics don't get on well with that mode [...] Turning off the mode does the trick and switches mpi4py back to standard send/recv.

Code was getting an MPI Internal Error on a recv on a worker process.

Intel says (https://software.intel.com/en-us/articles/python-mpi4py-on-intel-true-scale-and-omni-path-clusters):

Python users of the mpi4py package, leveraging capabilities for distributed computing on supercomputers with the Intel® True Scale or Intel® Omni-Path interconnects might run into issues with the default configuration of mpi4py.

The mpi4py package is using matching probes (MPI_Mpobe) for the receiving function recv() instead of regular MPI_Recv operations per default. These matching probes from the MPI 3.0 standard however are not supported for all fabrics, which may lead to a hang in the receiving function.

Therefore, users are recommended to leverage the OFI fabric instead of TMI for Omni-Path systems. For the Intel® MPI Library, the configuration could look like the following environment variable setting.:

I_MPI_FABRICS=ofi

Users utilizing True Scale or Omni-Path systems via the TMI fabric, might alternatively switch off the usage of matching probe operations within the mpi4py recv() function.

This can be established via mpi4py.rc.recv_mprobe = False right after importing the mpi4py package.

owainkenwayucl commented 6 years ago

We should look at modifying the mpi4py install so that it does this automatically, either by setting the environment variable or changing the behavior of the package so that the default value of mpi4py.rc.recv_mprobe is False on our systems.

nicolasgold commented 6 years ago

It's a one-line change in the source so if you are building from source rather than installing with pip, the simplest thing might be to change the file. It's on line 95 of this file: https://bitbucket.org/mpi4py/mpi4py/src/656444316cbb3e198eac5bf0a80a2bd56c27a63a/src/mpi4py/__init__.py?at=master&fileviewer=file-view-default

nicolasgold commented 6 years ago

I'm currently investigating a related build-issue to do with MPI profiling with mpi4py. Thus far, I haven't got it working and research suggests that in some cases the default build flags prevent mpi4py picking up certain libraries that are needed. This is only a profiling issue but I will try and track it down shortly so that if a different flag is needed at build time, it can be incorporated into this build rather than a separate request.

Please disregard this secondary issue: it looks like mpi4py fixed the issue in a March 2018 update.

owainkenwayucl commented 6 years ago

OK, I'll take a look at this.

owainkenwayucl commented 6 years ago

I've installed this on all our clusters.

The install script patches src/mpi4py/__init__.py so that rc.recv_mprobe = False unless you set $PATCH_RECV_MPROBE to anything other than "TRUE". So it's installed patched on Legion, Grace, Thomas that have Intel interconnect, and not patched on Myriad where we have Mellanox.

I've only installed for Python 3 because everyone should have moved to that, and I've done the install for the most up to date version of OpenMPI (3.1.1) on our systems.

owainkenwayucl commented 6 years ago

(I've run some test jobs and it seems to be working).