JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
376 stars 121 forks source link

Minimum MPI Version Supported #39

Open lcw opened 9 years ago

lcw commented 9 years ago

cc: @andreasnoack , @ViralBShah , @eschnett , @amitmurthy

A pull request from @steven-varga with an MPI-3 function brought up this issue. What should be the minimum supported MPI version for this wrapper?

Right now we are using some things from MPI-2 for the types so that is probably as low as we should go but should we require MPI-3? From this document it looks like most major MPI implementations have MPI-3 support or are planning it. The major exception seems to be Microsoft MPI.

We can at compile time determine the MPI version with the constants MPI_VERSION and MPI_SUBVERSION --- see MPI-1.3 Standard section 7.1.1 for more info --- so we could only bring in MPI-3 functions if the library supports it but it makes the wrapper a little more complicated.

Does anyone have any thoughts or concerns on this front?

eschnett commented 9 years ago

Many real-world large-scale HPC systems provide quite conservative versions of software libraries such as MPI. I'm sure many of these don't provide MPI-3 yet. This means that requiring MPI-3 is out of the question.

At the same time, I really appreciate some of the new features, such as split-phase global operations. Combined, this means that we need to autodetect the available MPI version.

lcw commented 9 years ago

True, personally I try to stick with MPI-1 only for my large scale codes. You never know what MPI function is going to break on you at scale (we once had allgather break on us).

@eschnett are you okay with requiring MPI-2, as we do now? Or should we make it optional as well.

eschnett commented 9 years ago

MPI-2 is from 1995; I hope that all relevant vendors support this standard. (Yes, I'm okay with requiring this.)

psanan commented 9 years ago

@lcw, very true. An issue is that there can be a vast difference between satisfying the MPI-3 standard and providing an efficient implementation. An example is MPI_Iallreduce, which I've been interested in using but which is quite involved to properly implement. As far as I remember, the standard does not actually require any asynchronous progress to be made, so the fact that an implementation conforms to the MPI-3 standard does not imply that it will allow for the desired behavior at scale.

I am commenting here because this might also be relevant to @amitmurthy 's proposed work re #38 : If the the custom transport for a ClusterManager is to be implemented purely with MPI, some MPI-2 and MPI-3 features might be useful and/or necessary.

ViralBShah commented 9 years ago

I think the right thing to do is to have a minimum requirement of MPI-2, while using MPI-3 if available. I don't know if any of the major open source MPI libraries support MPI-3 yet.

psanan commented 9 years ago

MPICH and Open MPI (which I would presume account for the vast majority of open-source MPI installations) both adhere to the MPI-3 standard.

ViralBShah commented 9 years ago

Yes, those are the major ones. Good to know. Thanks.

eschnett commented 9 years ago

I find that in particular the MPI-3 function MPI_Ibarrier is very handy to detect whether the application should terminate. Doing this without MPI_Ibarrier is quite complex -- one either has to do it serially, or implement a tree reduction manually.

JaredCrean2 commented 5 years ago

It seems we now require MPI 3, and had a build failure in the wild as a result: https://discourse.julialang.org/t/error-building-mpi-already-tried-setting-cc-fc/17817. In particular, that user reported the Ubuntu 16.04 package for OpenMPI installed MPI 2, not MPI 3.

barche commented 5 years ago

Maybe we should just ship openmpi binaries using BinaryBuilder, that way it will work out of the box for everyone. I'm not sure how well this would cross-compile for Windows, but if it does it might even be a good way to get rid of the Windows-specific code?

We should of course keep config options so an already installed MPI can be used if desired.

JaredCrean2 commented 5 years ago

I think the better solution is to add version checks to the files in /deps and the Julia source code that uses the generated constants. We have to do that anyways if we want to support older system MPI installations.

eschnett commented 5 years ago

@barche Sometimes a system's MPI library is built against hardware-specific libraries to use specific drivers and particular hardware. It is also often necessary to configure MPI so that it knows which network devices to use. Otherwise it won't work; hence, the "out of the box" is difficult to achieve.

barche commented 5 years ago

Yes, of course it would not cover all use cases, but it would make it easier for other packages to depend on MPI.jl and "just work" within the confines of the BinaryBuilder dependency chain. The Ubuntu package is also a one-size-fits-all MPI binary, so it shouldn't matter there if our binary is used instead.

Clusters also typically offer different versions of MPI, so it shouldn't be too hard to pick one that works? Is MPI3 really that unsupported? If so, adding version checks seems to be indeed a necessity.

JaredCrean2 commented 5 years ago

Is MPI3 really that unsupported

Yes. The Blue Gene at my school still doesn't support all of MPI 2. HPC systems with custom MPI implementations may never update the MPI implementation from when the system is first designed (and the systems have a service life of ~10 years).

JaredCrean2 commented 5 years ago

I created a proof of concept for finding in what version of MPI a symbol first appears here.

simonbyrne commented 5 years ago

It would be useful to have a BinaryBuilder-supplied MPI that could be used for local testing, along with a way to override to use a specific MPI when available.

PhilipVinc commented 5 years ago

I don't agree with having a BinaryBuilder-supplied MPI being the default.

What would happen if a package pulls-in MPI.jl as a dependency? I think of the use-case of users installing some (MPI-accellerated) package on a computational cluster, where 99.9% of the time the libraries are already installed. Package authors should then tell their users to rebuild MPI using the system-provided MPI installation for optimal performance.

barche commented 5 years ago

Yes, it's debatable what should be the default. Maybe we could also detect if an MPI is already installed and loaded? I also think a cluster user would be more inclined to double-check the correct MPI is used.

PhilipVinc commented 5 years ago

This would make more sense. A BinaryBuilder-MPI is something only Julia will use through the MPIManager. So if no external MPI is found it might make sense to assume the user won't be using mpirun.

On a related note, I personally think that a much more urgent thing would be to make MPI.jl correctly build with any version of MPI and throwing an error if an unsupported function is called.

I had started to experiment in splitting the build phase into 5 parts (versions 1.0, 1.1, 2.0, 3.0, 3.1). I don't have the time now, but if we agree on how the feature should be implemented I could hopefully get back to it after the summer.

barche commented 5 years ago

Yes, the version compatibility is a must-have, I had planned to look at it too, but I haven't even gotten round to looking at @JaredCrean2 's work.

PhilipVinc commented 5 years ago

I had experimented with having a .jl with a list of all mpi methods for each version. Then I was using CMAKE to generate a gen_functions_version.c for each version.

I would then compile each one of them, and at the first failure I would stop and consider the previous version as the maximum supported MPI version.

Ugly, but it was working. Though I'm sure there must be an easier way to do it.

eschnett commented 5 years ago

On a cluster, it is typically a non-trivial decision to find out which MPI library to use, to make it available (by loading modules or setting paths), and how to run with that version. It's quite easy to get confused and to build, link, and run with inconsistent MPI versions, leading to run-time errors (at best).

I think a BinaryBuilder provided version should be the default, combined with a simple way to output the MPI configuration that is used to aid debugging.

In particular if a package pulls in MPI.jl as a dependency, the user might not know or not care how to use MPI properly, and then having to deal with setting things up properly is a hindrance. The default should be "it just works" for a novice.

I wish it was possible that the default was "it just works for a cluster", but that isn't possible. Configuring things properly on a cluster (using a queuing system etc.) just isn't feasible.

barche commented 5 years ago

On a cluster, it is typically a non-trivial decision to find out which MPI library to use, to make it available (by loading modules or setting paths), and how to run with that version. It's quite easy to get confused and to build, link, and run with inconsistent MPI versions, leading to run-time errors (at best).

Yes, maybe it's too simplistic, but I assumed that on a cluster the user would load the required module before starting Julia, and we could just use whatever MPI is in the path, or the BinaryBuilder one if none is found.

barche commented 5 years ago

I had experimented with having a .jl with a list of all mpi methods for each version. Then I was using CMAKE to generate a gen_functions_version.c for each version.

Did you also look at the MPI_VERSION constant?

PhilipVinc commented 5 years ago

Yes I had, and it does not change much. I still would generate a bunch of gen_functions_version.c and then merge the output into a single .jl file.

The reason of my try-catch approach is that I work on a cluster that has a custom MPI distribution which does not support MP_NO_OP, even though, according to the version, it should. So I was trying to have a work-around. I agree it's an hack, and I don't expect MPI.jl to support it.

Besides, when you detect the right MPI_VERSION we need to agree on a way to error before ccalling an unsupported method.

simonbyrne commented 5 years ago

What would happen if a package pulls-in MPI.jl as a dependency? I think of the use-case of users installing some (MPI-accellerated) package on a computational cluster, where 99.9% of the time the libraries are already installed. Package authors should then tell their users to rebuild MPI using the system-provided MPI installation for optimal performance.

The easiest solution would be to use environment variables (say JULIA_MPI_PATH). If this is set, it would use the supplied MPI, otherwise it would use the BinaryBuilder version. Then on a cluster you could just set that environment globally.

eschnett commented 5 years ago

Typically, one cannot just call mpirun on a cluster; instead, one has to submit a job through a queuing system.

simonbyrne commented 5 years ago

In #271 I've exposed an MPI_VERSION constant, and versioned the generated constants on this (assuming a minimum version of MPI-2 for the time being). It should be possible to version the functions as well.

It seems like some libraries (e.g. MS-MPI) do partially add functionality from newer versions, so we may want a way to enable functionality beyond what is officially supported by the reported version.

It would be good to test the selective versioning: I guess one option is to test against an older version of MPICH/OpenMPI?

simonbyrne commented 4 years ago

Apparently now MPICH is supposed to be ABI compatible with many of its derivatives (though, interestingly, not Microsoft MPI): https://www.mpich.org/abi/ (they've apparently fixed the "customisable struct" issue that caused @andreasnoack headaches in #169).