JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
390 stars 143 forks source link

StackOverflowError during initialization of module Drivers #960

Closed david-macmahon closed 2 years ago

david-macmahon commented 2 years ago

When HDF5 v0.16.7 (or later) is configured to use system HDF5 libraries, I get a StackOverflowError during the initialization of the Drivers module. It doesn't happen on v0.16.6 (or earlier) and it doesn't happen if I use the bundled HDF5 libraries.

This is the error:

julia> using HDF5
ERROR: InitError: StackOverflowError:
Stacktrace:
 [1] h5fd_core_init()
   @ HDF5.API ~/.julia/packages/HDF5/I9NLZ/src/api/functions.jl:4209
 [2] __init__()
   @ HDF5.Drivers ~/.julia/packages/HDF5/I9NLZ/src/drivers/drivers.jl:88
 [3] _include_from_serialized(path::String, depmods::Vector{Any})
   @ Base ./loading.jl:696
 [4] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
   @ Base ./loading.jl:782
 [5] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1020
 [6] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:936
 [7] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:923
during initialization of module Drivers
mkitti commented 2 years ago

Which system?

david-macmahon commented 2 years ago

This is on Linux (Ubuntu 16.04) if that’s what you mean. It happens with both Julia 1.6.4 and 1.7.1.

david-macmahon commented 2 years ago

But I’m using HDF5 1.13 (I think) libraries that I built from source.

david-macmahon commented 2 years ago

My system libraries work fine with HDF5.jl v0.16.6 or earlier

mkitti commented 2 years ago

HDF5 1.13.x is a development version. We cannot support that at this time.

I would file a bug with HDF5 group at https://github.com/HDFGroup/hdf5/issues indicating that calling H5FDcore_init causes a stack overflow.

We have initialized the core driver on __init__ since 0.16.7 via https://github.com/JuliaIO/HDF5.jl/pull/928 .

musm commented 2 years ago

closing as this is an upstream issue.

mkitti commented 2 years ago

I did some work and identified that driver initialization in HDF5 1.13 has changed. See https://github.com/HDFGroup/hdf5/issues/1809

mkitti commented 2 years ago

Technically we are using a non-public API because the public API is the H5FD_CORE macro.

My proposed fix is that we only initialize the drivers for 1.12 or less. We'll need to implement the new form using H5FDperform_init when 1.14 is released.

mkitti commented 2 years ago

https://github.com/HDFGroup/hdf5/issues/1809#issuecomment-1154397046

Quoting upstream:

Maybe there is a way to use documented/supported APIs? For example, what if you create a file-access property list (FAPL), set the "core" driver on the FAPL using H5Pset_fapl_core, and then ask for the driver's hid_t using H5Pget_driver? Release the FAPL. Repeat for each VFD of interest. If I'm not mistaken, H5Pset_fapl_core, H5Pset_fapl_sec2, and so on, all belong to the public API.

@simonbyrne, perhaps we should replace the driver initialization with a procedure like the above?

simonbyrne commented 2 years ago

Eesh, that's complicated.

We might also be able to lazily initialize it? e.g. when calling set_driver!, add it to DRIVERS dict if it isn't already in there.

The only challenge would be SEC2, to since it is the default, but you could do that at __init__?

mkitti commented 2 years ago

Lazy would be good. I'm not clear if @david-macmahon actually wanted to use the core driver in this instance.

david-macmahon commented 2 years ago

Yikes, I didn't realize what a hornet's nest I was kicking! 😅

AFAIK, I have only used the default driver. Does the "core" driver refer to in-memory "files" (as in "core memory") or is it some sort of fundamental driver component (as in the "core" of an apple)?

FWIW, I was thinking of adding a search for H5FDperform_init via dlsym and calling it if available, otherwise call the original function.

david-macmahon commented 2 years ago

I re-read the upstream comment and have implemented the proposed solution in my dev'd HDF5.jl repo. It seems to fix the problem. I am running the test suite now with both the "bundled" HDF5 version and my local HDF5 version. Assuming it passes in both scenarios I will create a PR.

simonbyrne commented 2 years ago

Can be closed?