JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
387 stars 140 forks source link

Finalizers cause exception_access_violation while using HDF5 from a thread. #990

Closed mkitti closed 1 year ago

mkitti commented 2 years ago

When running long jobs, I'm occasionally running into this traces like this. What I think is happening is that the finalizer is running to close out a property while another thread is trying to access the HDF5 library. This can be be mitigated by running GC.gc() when between jobs where threads are used.

Note I'm doing this while using HDF5.jl on a single spawned thread.

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffb692e3cac -- H5FL_reg_calloc at C:\Users\simview\.julia\artifacts\a285947f59f2fa90c9ac36239548c5af147d0da5\bin\libhdf5-0.dll (unknown line)
in expression starting at REPL[1]:1
H5FL_reg_calloc at C:\Users\simview\.julia\artifacts\a285947f59f2fa90c9ac36239548c5af147d0da5\bin\libhdf5-0.dll (unknown line)
H5CX_push at C:\Users\simview\.julia\artifacts\a285947f59f2fa90c9ac36239548c5af147d0da5\bin\libhdf5-0.dll (unknown line)
H5Iis_valid at C:\Users\simview\.julia\artifacts\a285947f59f2fa90c9ac36239548c5af147d0da5\bin\libhdf5-0.dll (unknown line)
h5i_is_valid at C:\Users\simview\.julia\packages\HDF5\7zvRl\src\api\functions.jl:1354
isvalid at C:\Users\simview\.julia\packages\HDF5\7zvRl\src\properties.jl:19 [inlined]
close at C:\Users\simview\.julia\packages\HDF5\7zvRl\src\properties.jl:11
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
run_finalizer at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:278
jl_gc_run_finalizers_in_list at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:365
run_finalizers at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:394 [inlined]
run_finalizers at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:372
jl_gc_collect at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:3267
maybe_collect at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:882 [inlined]
jl_gc_pool_alloc at /cygdrive/c/buildbot/worker/package_win64/build/src\gc.c:1209
jl_gc_alloc_ at /cygdrive/c/buildbot/worker/package_win64/build/src\julia_internal.h:339 [inlined]
_new_array_ at /cygdrive/c/buildbot/worker/package_win64/build/src\array.c:137 [inlined]
_new_array at /cygdrive/c/buildbot/worker/package_win64/build/src\array.c:193 [inlined]
jl_alloc_array_3d at /cygdrive/c/buildbot/worker/package_win64/build/src\array.c:455
Array at .\boot.jl:461 [inlined]
Array at .\boot.jl:468 [inlined]
Array at .\boot.jl:474
unknown function (ip: 0000000063a9c088)
macro expansion at E:\Mirror\FND red stitching test\chunked\downsampling.jl:178 [inlined]
macro expansion at .\timing.jl:220 [inlined]
macro expansion at E:\Mirror\FND red stitching test\chunked\downsampling.jl:177 [inlined]
macro expansion at .\task.jl:399 [inlined]
downsample_to_memory_multithreaded_channel at E:\Mirror\FND red stitching test\chunked\downsampling.jl:136
simonbyrne commented 2 years ago

I guess the solution here is to close all properties objects when they are done?

simonbyrne commented 1 year ago

Following up on https://github.com/JuliaIO/HDF5.jl/pull/1013#discussion_r1003866863, how about we define a function

function with(fn, args...)
    try
        fn(args...)
    finally
        for arg in args
            finalize(arg)
        end
    end
end

so e.g. https://github.com/JuliaIO/HDF5.jl/blob/6022be0e1e44073152a84b5063baf5bed6eaf19b/src/groups.jl#L24-L35 could be written as

function create_group(parent::Union{File,Group}, path::AbstractString; pv...)
    with(_link_properties(path), GroupCreateProperties()) do lcpl, gcpl
        pv = setproperties!(lcpl, gcpl; pv...)
        isempty(pv) || error("invalid keyword options $pv")
        return create_group(parent, path, lcpl, gcpl)
    end
end

or perhaps a macro to do the same, e.g.:


function create_group(parent::Union{File,Group}, path::AbstractString; pv...)
    @with lcpl=_link_properties(path) gcpl=GroupCreateProperties() begin
        pv = setproperties!(lcpl, gcpl; pv...)
        isempty(pv) || error("invalid keyword options $pv")
        return create_group(parent, path, lcpl, gcpl)
    end
end
mkitti commented 1 year ago

I was thinking something similar, but calling close rather than finalize.

musm commented 1 year ago

@simonbyrne that sounds a lot better than the current setup and remind me of something similar in the D language for finalizers. The function form seems simpler than going with a macro, unless there is an intrinsic benefit of the macro.

mkitti commented 1 year ago

Does calling finalize run the finalizers on the current task/thread or does it schedule it on another task/thread?

simonbyrne commented 1 year ago

I'm pretty sure it runs on the current task (i.e. it calls it immediately). finalize also has the advantages of working on any object (even immutable ones), and being cleared once called (i.e. the finalizers from the object aren't later called again by the GC), e.g.:

julia> mutable struct Foo
       end

julia> close(f::Foo) = println("done")
close (generic function with 1 method)

julia> f = Foo()
Foo()

julia> finalizer(close, f)
Foo()

julia> close(f)
done

julia> f = nothing; GC.gc() # close called again
done

julia> f = Foo()
Foo()

julia> finalizer(close, f)
Foo()

julia> finalize(f)
done

julia> f = nothing; GC.gc() # close not called again
simonbyrne commented 1 year ago

I'm pretty sure it runs on the current task (i.e. it calls it immediately)

https://github.com/JuliaLang/julia/blob/8512dd2a570c1f181df96529d73512e8c766e33c/base/gcutils.jl#L72-L73

mkitti commented 1 year ago

This seems more general than HDF5.jl. While I'm happy to experiment with it here, perhaps we should consider putting it in a small package at some point? Maybe we should incubate it as a subdirectory package or at least put it within it's own module.

The @with macro looks quite like a let statement. Perhaps the macro should act on a let statement rather than a begin block?

simonbyrne commented 1 year ago

Closed by #1021