JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Use Preferences for using custom hdf5 library #1061

Closed JoshuaLampert closed 1 year ago

JoshuaLampert commented 1 year ago

This switches to Preferences.jl for using a system-provided hdf5 library and replaces the additional build step. Additionally, I allowed additional names of the library binaries as in my case, e.g., the binaries are located in /usr/lib/x86_64-linux-gnu/libhdf5_openmpi.so and /usr/lib/x86_64-linux-gnu/libhdf5_openmpi_hl.so.

closes #1037

cc @ranocha, @sloede

JoshuaLampert commented 1 year ago

Honestly, I don't understand why the tests complain about Package HDF5 does not have HDF5_JLL in its dependencies. Isn't HDF5_jll listed as a dependency here or am I missing something stupid?

mkitti commented 1 year ago

Please run the formatter. See the contrib folder for a utility.

Also this is breaking so I think we will want to version bump to 0.17 unless we implement a fallback. I'm also wondering if we could include a migration utility.

sloede commented 1 year ago

Also this is breaking so I think we will want to version bump to 0.17 unless we implement a fallback. I'm also wondering if we could include a migration utility.

MPI.jl had performed a similar migration last year, from configuration via environment variables to using Preferences. Back then they solved this by adding a migration guide to the docs and included notes for HPC administrators on how to configure this on clusters with managed MPI/HDF5 installations. Maybe this can serve as a blueprint for this PR here as well?

ranocha commented 1 year ago

Also this is breaking so I think we will want to version bump to 0.17 unless we implement a fallback. I'm also wondering if we could include a migration utility.

This should probably be mentioned in https://github.com/JuliaIO/HDF5.jl/blob/master/HISTORY.md, too, I guess?

mkitti commented 1 year ago

Yes, please document this as much as possible. If we do provide migration instructions, that should also be tested.

JoshuaLampert commented 1 year ago

I provided some migration instructions.

sloede commented 1 year ago

I provided some migration instructions.

This reads already very well! One suggestion from my side: Maybe you can give an example for admins of clusters what the Preferences file could look like and maybe explicitly mention that for backward compatibility it is recommended to both set the environment variables and use the Preferences file for the time being. But these are just suggestions, they are not critical IMHO. Other than that, this PR LGTM!

JoshuaLampert commented 1 year ago

I've also added an example of how the LocalPreferences.toml could look like and added a recommendation to also set the environment variable for backward compatibility.

sloede commented 1 year ago

I've also added an example of how the LocalPreferences.toml could look like and added a recommendation to also set the environment variable for backward compatibility.

Great, thanks!

JoshuaLampert commented 1 year ago

From my point of view this is good to go. What do you think, @mkitti?

ranocha commented 1 year ago

I think this PR is really nice! You could consider adding a CI job for a system libhdf5 with MPI enabled also on Julia v1 to test this setup on the latest stable release.

JoshuaLampert commented 1 year ago

Ok, I'll try. I've also noticed that the current system libhdf5 + mpich CI job still uses environment variables instead of MPIPreferences to use the system MPI. I'll also adapt that, right?

JoshuaLampert commented 1 year ago

I'm sorry, but I don't know why this error comes up. I tried to use a similar setup than we have in P4est.jl, but somehow it doesn't recognize the configure_packages.jl file. I would be thankful for a hint!

mkitti commented 1 year ago

I think we might try to get in another release before we introduce a breaking change. I'm going to try to hurry this process up.

ranocha commented 1 year ago

Ok, I'll try. I've also noticed that the current system libhdf5 + mpich CI job still uses environment variables instead of MPIPreferences to use the system MPI. I'll also adapt that, right?

That would be great, yes :+1:

JoshuaLampert commented 1 year ago

I think the failing test is not related to this PR. Locally, it's also failing on master with parallel HDF5 enabled.

JoshuaLampert commented 1 year ago

As with the new HDF5_jll release MPI is supported, I updated the docs accordingly. So, this is ready to go from my side. What do you think, @mkitti? CI failures seem unrelated to this PR (?)

ranocha commented 1 year ago

Thanks a lot! The CI failures on julia-nightly are caused by: UndefVarError:_memcpy!not defined. That's not related to this PR as far as I can tell :+1:

The error

virtual dataset: Error During Test at /home/runner/work/HDF5.jl/HDF5.jl/test/virtual_dataset.jl:31
  Test threw exception
  Expression: size(d) == (3, 2)
  HDF5.API.H5Error: Error getting dataspace
  libhdf5 Stacktrace:
    [1] H5FO_dest: Object cache/Unable to release object
        objects still in open object info set
     ⋮

in https://github.com/JuliaIO/HDF5.jl/actions/runs/5112753923/jobs/9191167908?pr=1061 seems to be real, though? Can you reproduce it locally? Has anyone seen it before?

JoshuaLampert commented 1 year ago

Can you reproduce it locally?

Yes, I can reproduce this error locally, when I use a system provided HDF5 library (v1.10.6), where I get even another error message. What confuses me is that julia v1.6 seems to work, but not v1.9.

JoshuaLampert commented 1 year ago

Ah, I see. On julia v1.9 libhdf5 v1.10.4 is used, while on julia v.1.6 libhdf5 v.1.14.0. Probably that's the difference that makes v1.9 fail and v1.6 pass.

ranocha commented 1 year ago

Can you use the newer libhdf5 also with Julia v1.9?

JoshuaLampert commented 1 year ago

Why do they actually use different libhdf5 versions, in the first place? Both run on ubuntu 20.04.

ranocha commented 1 year ago

That's weird - the Julia v1.6 run reports the expected version 1.10.4 when installing the system packages, e.g., https://github.com/JuliaIO/HDF5.jl/actions/runs/5112753923/jobs/9191167775#step:2:120 However, it shows 1.14.0 when running the tests in https://github.com/JuliaIO/HDF5.jl/actions/runs/5112753923/jobs/9191167775#step:7:171 That's exactly the version from HDF5_jll.jl. Does it not pick the correct system library?

JoshuaLampert commented 1 year ago

Yes, that's strange. Locally, with julia v1.6.7 it picks up the correct local libhdf5 library for me.

ranocha commented 1 year ago

If the problem is an old libhdf5 library, would moving to a newer version of the CI image help?

JoshuaLampert commented 1 year ago

You mean using ubuntu-latest instead of ubuntu-20.04?

ranocha commented 1 year ago

Something like that

ranocha commented 1 year ago

I vaguely remember that there were some issues with Pkg.test not picking up preferences correctly, e.g., https://github.com/JuliaLang/Pkg.jl/issues/2500 and https://github.com/JuliaLang/Pkg.jl/issues/3389. One comment in https://github.com/JuliaLang/Pkg.jl/issues/2500 seems to suggest that the issue may be fixed in Julia v1.7 - but not in Julia v1.6? If so, this would explain why the preferences are not picked up correctly?

JoshuaLampert commented 1 year ago

That's exactly what I tried in 338791c. Didn't work, see https://github.com/JuliaIO/HDF5.jl/actions/runs/5114015916/jobs/9193838265. However, there it was another error...

JoshuaLampert commented 1 year ago

I vaguely remember that there were some issues with Pkg.test not picking up preferences correctly, e.g., https://github.com/JuliaLang/Pkg.jl/issues/2500 and https://github.com/JuliaLang/Pkg.jl/issues/3389. One comment in https://github.com/JuliaLang/Pkg.jl/issues/2500 seems to suggest that the issue may be fixed in Julia v1.7 - but not in Julia v1.6? If so, this would explain why the preferences are not picked up correctly?

Yes, you're right. Local tests confirm that Pkg.test picks the right preferences with julia v1.8.0 (was fixed in https://github.com/JuliaLang/Pkg.jl/pull/2920, which was part of v1.8), but not with v1.6.7 and also not for julia v1.7.3 (in my previous comment I didn't look whether the preferences are picked up correctly with Pkg.test, but only if the right libhdf5 library is picked up in HDF5.API.h5_get_libversion()). Should I change the julia 1.6 version for the system libhdf5 + mpich test to v1.8 or should we just drop the version and only keep v1.9?

ranocha commented 1 year ago

Could you please try a workaround similar to https://github.com/maleadt/LLVM.jl/commit/ebb44ee1acbde749f3cfb96e51e880186cd810de for Julia v1.6 here?

JoshuaLampert commented 1 year ago

Could you please try a workaround similar to https://github.com/maleadt/LLVM.jl/commit/ebb44ee1acbde749f3cfb96e51e880186cd810de for Julia v1.6 here?

I did that. Now also julia v1.6 recognizes the local libhdf5 installation correctly and there is only the problem with the older libhdf5 library.

ranocha commented 1 year ago

Great, thanks! I think it would be good to get feedback/help from the HDF5.jl maintainers at this point to fix the CI problem.

mkitti commented 1 year ago

I'm a bit confused why the one test is failing now. Could someone summarize the current understanding of why it fails?

JoshuaLampert commented 1 year ago

For me, it's also failing on master with the same error using a local libhdf5 v1.10.6 on

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_HDF5_PATH = /usr/lib/x86_64-linux-gnu/hdf5/openmpi

I'm rather confused why it's not failing in the other CI tests from other PRs.

simonbyrne commented 1 year ago

I think the problem is that h5d_get_space is broken for virtual datasets for libhdf5 1.10. At the moment, we only test custom hdf5 build on Julia 1.3, and where we disable the virtual dataset tests.

We previously thought it was a buggy JLL, but it could just be all v1.10. I was able to replicate the error with a local build using libhdf5 1.10.7.

I suggest we just disable the virtual dataset tests if using libhdf5 <1.12.

JoshuaLampert commented 1 year ago

That totally makes sense. Thanks, @simonbyrne! I disabled the virtual datasets tests for libhdf5 < 1.12.

JoshuaLampert commented 1 year ago

Loading MPI.jl before HDF5.jl is not required anymore with Preferences.jl, right? I changed the docs accordingly.

ranocha commented 1 year ago

Status of this PR is that HDF5.jl still uses Requires.jl to load the MPI code, doesn't it? If so, it's still necessary to first load MPI.jl as far as I know

mkitti commented 1 year ago

Could we issue a warning if we detect that the environment variable JULIA_HDF5_PATH is set?

JoshuaLampert commented 1 year ago

Could we issue a warning if we detect that the environment variable JULIA_HDF5_PATH is set?

Done in 8e98087.

mkitti commented 1 year ago

I think we should also document configuring HDF5_jll as well in case some other package tries to load that.

See https://docs.binarybuilder.org/stable/jll/#Overriding-specific-products

mkitti commented 1 year ago

If you would prefer, I would be fine merging this as is and implementing the above later.

JoshuaLampert commented 1 year ago

I addressed most of the suggestions. The remaining parts can be done in a separate PR.

sloede commented 1 year ago

🥳