JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

MPI support via package extensions (weak dependency) #1062

Closed ranocha closed 1 year ago

ranocha commented 1 year ago

This setup keeps using Requires.jl for Julia v1.8 and older but switches to package extensions for Julia v1.9.

With Julia v1.8.5, I get (for both this PR and the current release of HDF5.jl)

julia> @time_imports using MPI, HDF5
      0.2 ms  Requires
      2.1 ms  DocStringExtensions 56.69% compilation time
     19.0 ms  Preferences
      0.2 ms  MPIPreferences
      0.2 ms  JLLWrappers
      0.3 ms  CompilerSupportLibraries_jll
      2.9 ms  MPICH_jll 66.09% compilation time
    208.3 ms  MPI 69.33% compilation time (<1% recompilation)
      0.6 ms  Compat
      0.1 ms  Zlib_jll
      0.5 ms  OpenSSL_jll
      0.6 ms  HDF5_jll
    631.2 ms  HDF5 88.36% compilation time (63% recompilation)

With Julia v1.9.0-rc1 and the current release of HDF5.jl, I get

julia> @time_imports using MPI, HDF5
      0.3 ms  Requires
      1.1 ms  DocStringExtensions
      9.4 ms  Preferences
      0.3 ms  MPIPreferences
      0.2 ms  JLLWrappers
      3.4 ms  MPICH_jll 74.32% compilation time
    167.6 ms  MPI 88.40% compilation time
      0.1 ms  Compat
      0.1 ms  Compat → CompatLinearAlgebraExt
      0.1 ms  Zlib_jll
      0.5 ms  OpenSSL_jll
      0.7 ms  HDF5_jll
    149.7 ms  HDF5 62.28% compilation time

With Julia v1.9.0-rc1 and this PR, I get

julia> @time_imports using MPI, HDF5
      0.5 ms  Requires
      1.6 ms  DocStringExtensions
     14.0 ms  Preferences
      0.3 ms  MPIPreferences
      0.2 ms  JLLWrappers
      3.8 ms  MPICH_jll 71.64% compilation time
    160.6 ms  MPI 87.36% compilation time
      0.1 ms  Compat
      0.1 ms  Compat → CompatLinearAlgebraExt
      0.1 ms  Zlib_jll
      0.5 ms  OpenSSL_jll
      0.8 ms  HDF5_jll
     46.6 ms  HDF5
      0.4 ms  HDF5 → MPIExt

This avoids some compilation time whenever MPI and HDF5 are loaded together.

ranocha commented 1 year ago

There is a problem with the current approach to define HDF5.Drivers.MPIO depending on whether MPI.jl has been loaded. Citing Kristoffer Carlsson from discourse:

Extensions are a run time thing, they might be loaded or they might not be so you have to check. Therefore, you cannot make a decision about them during precompile time (or macro expansion time). That’s why you need to run some code (like a function) to interact with them and you cannot really just export a type from it.

Just exporting a type from it is basically the current approach HDF5.jl takes with MPIO. Any preferences how to fix this?

ranocha commented 1 year ago

I implemented a workaround for the issue in 30cf866. I am not completely happy with it, but it is the best I can think of right now to keep compatibility with Julia v1.6 - v1.8.

ranocha commented 1 year ago

Package extensions are only supported in Julia v1.6 and newer. Since this compat bump is included in https://github.com/JuliaIO/HDF5.jl/pull/1061, we can wait until the other PR is merged before continuing with this PR.

mkitti commented 1 year ago

This should coordinate with https://github.com/JuliaPackaging/Yggdrasil/pull/6551

eschnett commented 1 year ago

@mkitti What exactly need to be coordinated?

mkitti commented 1 year ago

@eschnett As far as I know, we have never loaded a MPI-enabled HDF5_jll. For this package extension to work properly, we should examine if the loaded JLL package is actually MPI capable.

ranocha commented 1 year ago

As far as I understand, the current approach of HDF5.jl is as follows

That's still the same approach when rewriting it as a package extension. Thus, I am not sure I understand. Do you suggest to wait until https://github.com/JuliaPackaging/Yggdrasil/pull/6551 and #1061 are merged before merging this PR?

mkitti commented 1 year ago

The first issue is awareness that both issues are coming down the pipeline.

We should now be able to test versus the new HDF5_jlls.

The problem here is that we are changing multiple components at once here, so we need to make sure to do integrated testing to ensure that changing either component alone or changing both does not create unexpected issues. Otherwise, it will get complicated to diagnose future problems.

ranocha commented 1 year ago

Ok. I will leave this PR open. Please let me know when you think it's a good time to revisit it

mkitti commented 1 year ago

I'm mainly just triaging here. It's @simonbyrne that is really needed to review this carefully.

simonbyrne commented 1 year ago

Should we just ditch Julia 1.3 support at this point? MPI.jl has a minimum Julia version of 1.6 (the current LTS)

ranocha commented 1 year ago

Should we just ditch Julia 1.3 support at this point? MPI.jl has a minimum Julia version of 1.6 (the current LTS)

Yes, I think this would be reasonable. This compat bump is included in #1061. Thus, I think we can wait until the other PR is merged before continuing with this PR.

KristofferC commented 1 year ago

Bump, HDF5 has the largest init time out of a large set of packages in something I profile:

gnome-shell-screenshot-9hbvpn

ranocha commented 1 year ago

We're just waiting for #1061 to be merged since it updates the minimum Julia version and the CI tests accordingly.

mkitti commented 1 year ago

Bump, HDF5 has the largest init time out of a large set of packages in something I profile:

Interesting. Could you see if this is true for HDF5 v0.16.14 and/or HDF5_jll.

I do suspect this is MPI related.

KristofferC commented 1 year ago

Interesting. Could you see if this is true for HDF5 v0.16.14 and/or HDF5_jll.

Not sure what this means exactly but the screenshot is for __init__ in the HDF5.Drivers submodule.

ranocha commented 1 year ago

Bump, HDF5 has the largest init time out of a large set of packages in something I profile:

Interesting. Could you see if this is true for HDF5 v0.16.14 and/or HDF5_jll.

I do suspect this is MPI related.

Yes, it's caused by the way how MPI is handled at the moment (I assume it's OmniPackage.jl depending on Trixi.jl depending on both MPI.jl and HDF5.jl)

mkitti commented 1 year ago

I suspect that the increase in driver load time is a recent regression. The new HDF5_jll builds make MPI always available, so we may loading the driver every time.

mkitti commented 1 year ago

I'm having some trouble reproducing, but perhaps I am missing something.

julia> @time_imports using MPI, HDF5
      4.0 ms  DocStringExtensions
     31.6 ms  Preferences
      0.5 ms  MPIPreferences
      0.5 ms  JLLWrappers
      6.8 ms  MPICH_jll 70.80% compilation time
      0.3 ms  PrecompileTools
    128.3 ms  MPI 58.11% compilation time
      0.3 ms  Requires
      0.1 ms  Compat
      0.2 ms  Compat → CompatLinearAlgebraExt
      0.9 ms  OpenSSL_jll
      0.2 ms  Zlib_jll
      0.4 ms  libaec_jll
      4.0 ms  HDF5_jll
    268.3 ms  HDF5 66.09% compilation time

(jl_XytuBr) pkg> st
Status `/tmp/jl_XytuBr/Project.toml`
  [f67ccb44] HDF5 v0.16.15
  [da04e1cc] MPI v0.20.9

(jl_XytuBr) pkg> st -m
Status `/tmp/jl_XytuBr/Manifest.toml`
  [34da2185] Compat v4.6.1
  [ffbed154] DocStringExtensions v0.9.3
  [f67ccb44] HDF5 v0.16.15
  [692b3bcd] JLLWrappers v1.4.1
  [da04e1cc] MPI v0.20.9
  [3da0fdf6] MPIPreferences v0.1.7
  [aea7be01] PrecompileTools v1.1.1
  [21216c6a] Preferences v1.4.0
  [ae029012] Requires v1.3.0
  [0234f1f7] HDF5_jll v1.14.0+0
  [1d63c593] LLVMOpenMP_jll v15.0.4+0
  [7cb0a576] MPICH_jll v4.1.1+1
  [f1f71cc9] MPItrampoline_jll v5.3.0+0
  [9237b28f] MicrosoftMPI_jll v10.1.3+4
  [fe0851c0] OpenMPI_jll v4.1.5+0
  [458c3c95] OpenSSL_jll v3.0.8+0
  [477f73a3] libaec_jll v1.0.6+1
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.3
  [76f85450] LibGit2
  [8f399da3] Libdl
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.9.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.0.2+0
  [deac9b47] LibCURL_jll v7.84.0+0
  [29816b5a] LibSSH2_jll v1.10.2+0
  [c8ffd9c3] MbedTLS_jll v2.28.2+0
  [14a3606d] MozillaCACerts_jll v2022.10.11
  [83775a58] Zlib_jll v1.2.13+0
  [8e850ede] nghttp2_jll v1.48.0+0
  [3f19e933] p7zip_jll v17.4.0+0
mkitti commented 1 year ago

Could we get CI to pass?

ranocha commented 1 year ago

CI fails on Julia v1.3 since Preferencesjl requires Julia v1.6. Everything else passes here. I didn't want to update the lower bound on Julia here since that's already done in #1061

KristofferC commented 1 year ago

I'm having some trouble reproducing, but perhaps I am missing something.

It's on https://github.com/JuliaComputing/OmniPackage.jl/pull/67.

ranocha commented 1 year ago

@mkitti I think this is ready for a review

mkitti commented 1 year ago

What is going on with the nightly CI failure? I'll take a closer look this weekend.

mkitti commented 1 year ago

@simonbyrne , would you like another look?

ranocha commented 1 year ago

Great, thanks!