eschnett / ADIOS2.jl

A Julia interface to ADIOS2
MIT License
13 stars 6 forks source link

Using ADIOS2 with MPI and system-provided libraries #9

Closed rkube closed 1 year ago

rkube commented 2 years ago

Hi, I'm trying to use ADIOS2 with MPI using the vendor-provided MPI library cray-mpich. While MPI.jl works fine, throwing ADIOS2 into the mix results in LoadErrors:

rkube@nid005838:~/source/julia_perlmutter> srun -n 1 ~/software/julia-1.8.0/bin/julia --project=. src/test_mpi_adios.jl 
ERROR: LoadError: InitError: MPI library has changed, please re-run `Pkg.build("MPI")`.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] __init__deps()
    @ MPI ~/.julia/packages/MPI/08SPr/deps/deps.jl:12
  [3] __init__()
    @ MPI ~/.julia/packages/MPI/08SPr/src/MPI.jl:69
  [4] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
    @ Base ./loading.jl:831
  [5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
    @ Base ./loading.jl:1039
  [6] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1315
  [7] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1200
  [8] macro expansion
    @ ./loading.jl:1180 [inlined]
  [9] macro expansion
    @ ./lock.jl:223 [inlined]
 [10] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1144
 [11] include
    @ ./Base.jl:419 [inlined]
 [12] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
    @ Base ./loading.jl:1554
 [13] top-level scope
    @ stdin:1
during initialization of module MPI
in expression starting at /global/homes/r/rkube/.julia/packages/ADIOS2/oDjS9/src/ADIOS2.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile ADIOS2 [e0ce9d3b-0dbd-416f-8264-ccca772f60ec] to /global/homes/r/rkube/.julia/compiled/v1.8/ADIOS2/jl_6MjXRQ.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
   @ Base ./loading.jl:1705
 [3] compilecache
   @ ./loading.jl:1649 [inlined]
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1337
 [5] _require_prelocked(uuidkey::Base.PkgId)
   @ Base ./loading.jl:1200
 [6] macro expansion
   @ ./loading.jl:1180 [inlined]
 [7] macro expansion
   @ ./lock.jl:223 [inlined]
 [8] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:1144
in expression starting at /global/u2/r/rkube/source/julia_perlmutter/src/test_mpi_adios.jl:6
srun: error: nid005838: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=3062902.4
rkube@nid005838:~/source/julia_perlmutter> cat src/test_mpi_adios.jl 
using MPI
MPI.Init()

using Random

using ADIOS2

comm = MPI.COMM_WORLD
rkube@nid005838:~/source/julia_perlmutter> 

This is the Project.toml:

[deps]
ADIOS2 = "e0ce9d3b-0dbd-416f-8264-ccca772f60ec"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"

To install MPI.jl with the system-provided libraries I followed the instructions provided here: https://juliaparallel.org/MPI.jl/stable/configuration/#Using-a-system-provided-MPI

Is there an easy way to make this setup work?

eschnett commented 2 years ago

@rkube The problem is that ADIOS2.jl calls into the actual ADIOS2 library libadios2.so, which is written in C++, and which is directly linked against MPI. This library is precompiled and downloaded by the Julia package manager, and is not compatible with any system-provided MPI.

The solution that I prefer is to use MPItrampoline. This is an MPI wrapper around any MPI implementation that provides a system-independent ABI. It's designed to handle exactly this case.

To use MPItrampoline, you would:

Under the hood, this still uses cray-mpich.

I've been using this approach for a while, and am now asking others to do so as well because I think this is the simplest way to address this problem. Please let me know if something doesn't work, or if there's another problem.

The "standard" way of addressing this would be to build the ADIOS2 C++ library yourself, against cray-mpich, and then make ADIOS2.jl use that library.

rkube commented 1 year ago

Thanks for the detailed instructions, this approach works fine.

eschnett commented 1 year ago

Glad to hear!

jychoi-hpc commented 1 year ago

I am trying to build ADIOS2.jl to use my own adios2 library. What commnad should I use? I am very new to Julia. Thanks.

eschnett commented 1 year ago

@jychoi-hpc Using your own ADIOS2 library is a difficult task. The problem is that you need to ensure binary compatibility between the library you built and the glue code that ADIOS2.jl is using. I do not have ready-made instructions for this. Julia can automate many complex steps when using its own ADIOS2 library. You would have to change the part of ADIOS2.jl that loads the external ADIOS2 library. Since Unix library linking does not check types, any error you make would at best lead to segfaults.

If possible, I would recommend using the ADIOS library that comes with Julia. Why are you using your own library?

jychoi-hpc commented 1 year ago

I see. I understand the complexity and difficulty.

Basically, I am trying to use on Perlmutter, Cori (NERSC), or Summit (ORNL). On those machines, I found using their system library or something I compiled specificifally on that machine worked better than using pre-built binary. It looks like HDF5's Julia package supports using system binary: https://juliaio.github.io/HDF5.jl/stable/#Using-custom-or-system-provided-HDF5-binaries

Another reason would be as follows. I found the pre-built binary is not sufficient. ADIOS has so many features and options to turn on/off. I want to use my specific Adios build with Julia. Since I am new to Julia, I might miss something. Please correct me or give any information.

eschnett commented 1 year ago

I myself have a C/Fortran/C++ background. When coming to Julia, I wanted to have the freedom to link against any package that I built myself, as this is common in these languages, especially on HPC systems. However, I found that not doing so has many advantages, and I have not found it necessary to install package on my own (with the exception of MPI, which is a different story).

It is currently, unfortunately, not easy to use ADIOS2.jl with a custom adios2 library. Managing finding and loading such a package is handled by the Julia package ADIOS2_jll.jl, which is used by ADIOS2.jl. ADIOS2_jll.jl is generated automatically when the adios2 library is built by BinaryBuilder.

I assume the best way to support loading your own library is to remove the dependency on ADIOS2_jll.jl, and use libdl.jl to load the adios2 library yourself. The Julia documentation describes how to do that. If you have a concrete problem with the default adios2 library then I'd be happy to look into that, either correcting that problem in ADIOS2_jll.jl or adding a mechanism to load a self-built library. Otherwise my strong suggestion is to use ADIOS2.jl as is, even though that practice is different from how packages in C/Fortran/C++ would be managed.

williamfgc commented 1 year ago

@eschnett thanks for the pointers. Question, as I don't know much about Julia's packaging, can both approaches coexist in your current implementation? Meaning using the environment variable such as JULIA_ADIOS2_PATH pointing at a ADIOS2 native installation and then pkg> build , similar to HDF5. If so, this would be done at the ADIOS2_jll level or the ADIOS2.jl level? Thanks for the effort of making this available.

eschnett commented 1 year ago

ADIOS2.jl contains only the line using ADIOS2_jll. I assume this line could in principle be inside an if statement that checks an environment variable, and does something else (load an external library via libdl and set up a few things) otherwise. Of course, this would break precompiling. I guess the respective logic could be moved into an __init__ function.

Ultimately, ADIOS2_jll.jl sets a global constant libadios2_c that is used by the ccalls in ADIOS2.jl.

I don't recall how to move this logic to build time, but I'm sure this can be done as well.

Is there a particular problem you're trying to address? Is the Julia-provided adios2 library insufficient or configured wrong?

williamfgc commented 1 year ago

Thanks @eschnett , it's mostly what @jychoi-hpc mentioned. ADIOS2 is really a framework with multiple compile-time options. Linking to a spack-installed version module or more experimental/development features on HPC clusters with their own vendor MPI, would be the end goal. Nothing wrong with the current ADIOS2.jl, actually what you did is the sane choice.

eschnett commented 1 year ago

If using a vendor MPI is the issue, then I recommend MPItrampoline as remedy. It's straightforward to configure Julia's MPI.jl to use it.

omlins commented 1 year ago

@eschnett, maybe it would be easy enough to implement something inspired by our workaround using Overrides.toml:

We created a file Overrides.toml in the artifacts folder of the julia depot used and it contained this (the first is the hash of the ADIOS2 artifact): 6b9210b346f45445a1acf311f166f5ed0655a0b0 = "path-to-our-adios2-module-installation" See: https://pkgdocs.julialang.org/v1/artifacts/#Overriding-artifact-locations-1 In addition, we had to move in our module the content of lib64 to lib to match the directory structure of ADIOS2_jll.

@jychoi-hpc : if none of the other solutions works for you, the here described should work. It worked for us on Piz Daint with Cray-MPICH