JuliaParallel / Elemental.jl

Julia interface to the Elemental linear algebra library.
Other
78 stars 15 forks source link

Support for Elemental built against system MPI #37

Open bluehope opened 8 years ago

bluehope commented 8 years ago

Is there any plan to support MVAPICH2 & intel MPI?

Intel MPI (which has binary compatibility with MVAPICH2) is also one of widely used MPI implementation. It would be nice to Elemental to support MVAPICH2!

poulson commented 8 years ago

Elemental absolutely supports every modern MPI implementation. I assume that you mean Elemental.jl?

bluehope commented 8 years ago

@poulson Oh, Yes. I meant "Elemental.jl". Thank you for the correction.

andreasnoack commented 8 years ago

For some reason, I'd unwatched my package here so I've only seen this issue now. Soon, we'll change MPI.jl to use the C-API and also hard code various MPI implementations. When that PR is merged, I'll probably delete the MPI functions here and add MPI.jl as a dependency. Adding support for MVAPICH would then be a matter of adding support in MPI.jl and I guess it is basically a matter of copying the definitions from MPICH. Feel free to open a PR to speed up the process.

ViralBShah commented 4 years ago

We no longer build the sources as part of the package installation like we used to.

So the only options are - use what ships with BinaryBuilder, or provide your own custom build. Note that while MPI.jl allows system MPI to be used, Elemental.jl needs an update to allow a system build (that lets it opt out of the BB provided binaries).

JBlaschke commented 3 years ago

Hi, is there a way we can speed up this update? I volunteer my time. At NERSC we need to build against the system MPI, so I am happy to help out if this means we can deploy Elemental.jl on our systems sooner.

I am still a bit new to BB, so can someone give be some guidance how I can "opt out of" BB provided binaries?

JBlaschke commented 3 years ago

Btw @ViralBShah in Julia 1.6.0 we get

Warning: Error requiring `MPICH_jll` from `MPI`
  exception =
   MPICH_jll cannot be loaded: MPI.jl is configured to use the system MPI library

As a workaround I tried:

] dev --local Elemental
] dev --local Elemental_jll
] dev --loca MPICH_jll

but that doesn't fix it. TBH, I can only see one place where MPICH_jll is needed -- here: https://github.com/JuliaParallel/Elemental.jl/blob/83089155659739fea1aae476c6fd492b1ee20850/test/runtests.jl

Then again, I don't know much about BinaryBuilder but it looks like Requires goes through the package specs, and throws this warning. What is not clear to me is how I can tell Requires that not using the system MPI is not an option. What do you think?

JBlaschke commented 3 years ago

Follow-up: is there a way to drop the MPICH_jll requirement in https://github.com/JuliaBinaryWrappers/Elemental_jll.jl ?

andreasnoack commented 3 years ago

MPI.jl has a mechanism that allows for using a system MPI instead of the BB provided MPI. However, it's really not clear to me how that can work here unless we reintroduce the code for building Elemental as part of this package. We could try to mimic the MPI.jl code and link against a system provided libelemental but, historically, it was important to keep a tight connection between the version of the wrappers here and the version of libelemental since the API was evolving.

If you already have a build of libelemental that links against your MPI then you can try to remove Elemental_jll and just point the libEl variable to that libEl.so (or what it's called, I don't recall).

JBlaschke commented 3 years ago

@andreasnoack if you can share with me the BB code that was used to generate Elemental_jll (the build_tarbals.jl? As I said, I'm new to this) then I could take a stab at a locally build one. I think we can't get around an Elemental_jll because Elemental.jl references it in its source: using Elemental_jll: libEl. My strategy at NERSC would then be to provide our own Elemental_jll in our admin repo.

andreasnoack commented 3 years ago

It's here, https://github.com/JuliaPackaging/Yggdrasil/blob/master/E/Elemental/build_tarballs.jl, but I don't see why it would be easier to modify Elemental_jll instead of just Elemental.jl.

JBlaschke commented 3 years ago

Thanks @andreasnoack I'll take a stab at this and let you know. This is easier because I don't need elemental. Several of our users do. So I could explain to each of our users how to modify Elemental.jl. Or I could build our own NERSC-specific Elemental_jll which would live in our global admin depot (alongside of our MPI.jl). If I do things right, this should be automatically picked up whenever a user installs Elemental.jl, without them needing to fiddle with the source code (or build their own libEl).

JBlaschke commented 3 years ago

Another question @andreasnoack -- why do you use the deprecated elemental repo instead of https://github.com/LLNL/Elemental ? The LLNL version comes with CUDA support.

ViralBShah commented 3 years ago

I think the last time I checked, I could not get the build to work in the new repo. We are not yet ready to enable CUDA support in BinaryBuilder, because we don't have infrastructure to distribute CUDA-built binaries. @maleadt may be able to say when we can expect that.

In the meanwhile, we can certainly switch to the new upstream repo for building Elemental_jll. @Would you be able to submit a PR?

andreasnoack commented 3 years ago

Or I could build our own NERSC-specific Elemental_jll which would live in our global admin depot (alongside of our MPI.jl)

What I don't understand is why it would be easier to put Elemental_jll there instead of just putting a modified version of Elemental.jl in your global admin repo.

JBlaschke commented 3 years ago

@andreasnoack a user might want a specific version/make their own changes. This seems to be more maintainable: unless the libEl build instructions change -- if I understand this correctly -- I can just re-build libEl locally by re-running the same build script. On the other hand, if I maintain a patched Elemental.jl, then I need to re-apply the patch (and possibly re-build libEl anyway) every time I update Elemental.jl.

Also: we don't have an Elemental module, so I would have to write a build script anyway.

ViralBShah commented 3 years ago

What we should really do is update Elemental_jll to be from the new repo. Then update Elemental.jl to use those new binaries, and whatever features are needed to use a system Elemental - put them behind an environment variable. We are happy to update the upstream package to allow whatever local configuration is necessary.

ViralBShah commented 3 years ago

Here's some of the problems trying to build the Elemental from LLNL:

https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=11741&view=logs&j=bdc19914-4824-529b-e606-c39779d9c0ef&t=ca989bc1-9e4f-55b3-32df-8eaed39b717f&l=2625

Sideboard commented 2 years ago

Are there any new developments concerning this issue?

I wanted to test leastsquares() for distributed systems but ran into MPI vs MPICH errors. At least one cluster I'm working on uses Intel MPI so it seems like a serious constraint having to use MPICH for Elemental.jl.

bernstei commented 2 years ago

To make the BinaryBuilder process more flexible, are there few enough ABIs (e.g. MPICH, which IntelMPI apparently also uses, and OpenMPI) that just having a couple (or a few, but still not too many) of versions of libEl, one for each ABI, is enough? The actual runtime selection of libEl can be handled by an argument or env var, and the underlying MPI library by LD_LIBRARY_PATH.

With #64 extended to a few more strings that might be sufficient for a reasonably wide range of uses.

vchuravy commented 2 years ago

If someone want to take on https://github.com/JuliaPackaging/Yggdrasil/pull/4776 that would be great. Then we can provide binaries for OpenMPI & MPICH as well as our portability layer MPItrampoline

wcwitt commented 1 year ago

I see there has been some progress here (https://github.com/JuliaPackaging/Yggdrasil/pull/5130), but I can't tell what, if anything, still needs to happen. Any advice?

vchuravy commented 1 year ago

@wcwitt if you want to take a look, you could start with #80. I won't have time to finish this any time soon, but that's my believe in how we could start rewriting Elemental.jl to make use of the multi-MPI support we now have in MPI 0.20