JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
560 stars 92 forks source link

jldsave error #389

Closed gideonsimpson closed 2 years ago

gideonsimpson commented 2 years ago

In code that has otherwise been working, when I call jldsave, I recently got an error I hadn't seen before. When I say it's otherwise working, when I run the problem with slightly different numerical parameters, I don't get the same errors. This also only showed up in a job that took 25 hours on a cluster, so it's a little tough to provide a simple example that consistently produces the error.

error in running finalizer: UndefRefError()
getproperty at ./Base.jl:42 [inlined]
isempty at /home/simpson/.julia/packages/JLD2/k9Gt0/src/groups.jl:191 [inlined]
close at /home/simpson/.julia/packages/JLD2/k9Gt0/src/JLD2.jl:428
jld_finalizer at /home/simpson/.julia/packages/JLD2/k9Gt0/src/JLD2.jl:469
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_run_pending_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:405
jl_mutex_unlock at /buildworker/worker/package_linux64/build/src/julia_locks.h:131 [inlined]
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:359
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1980
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:2246 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2239 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_typeparams at ./show.jl:640
show_datatype at ./show.jl:1011
show_datatype at ./show.jl:989 [inlined]
_show_type at ./show.jl:889
jfptr__show_type_21105.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show at ./show.jl:881
jfptr_show_35734.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#sprint#426 at ./strings/io.jl:112
unknown function (ip: 0x15553e5e5490)
sprint##kw at ./strings/io.jl:108 [inlined]
#print_type_stacktrace#485 at ./show.jl:2399
print_type_stacktrace at ./show.jl:2399
unknown function (ip: 0x15553e5e51c1)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#show_tuple_as_call#484 at ./show.jl:2380
show_tuple_as_call##kw at ./show.jl:2353
unknown function (ip: 0x15553e5e8094)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_spec_linfo at ./stacktraces.jl:244
print_stackframe at ./errorshow.jl:709
print_stackframe at ./errorshow.jl:685
#show_full_backtrace#834 at ./errorshow.jl:574
show_full_backtrace##kw at ./errorshow.jl:565 [inlined]
show_backtrace at ./errorshow.jl:769
#showerror#813 at ./errorshow.jl:90
showerror##kw at ./errorshow.jl:87
unknown function (ip: 0x15553e5e07b4)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_exception_stack at ./errorshow.jl:866
unknown function (ip: 0x15553e5d9371)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
display_error at ./client.jl:104
unknown function (ip: 0x15553e5d8851)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
display_error at ./client.jl:107
unknown function (ip: 0x15553e5d830d)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
exec_options at ./client.jl:294
_start at ./client.jl:495
jfptr__start_40531.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
error in running finalizer: Base.SystemError(prefix="close", errnum=116, extrainfo=nothing)
#systemerror#69 at ./error.jl:174
systemerror##kw at ./error.jl:174
systemerror##kw at ./error.jl:174
#systemerror#68 at ./error.jl:173 [inlined]
systemerror at ./error.jl:173 [inlined]
close at ./iostream.jl:63
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:367
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_run_pending_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:405
jl_mutex_unlock at /buildworker/worker/package_linux64/build/src/julia_locks.h:131 [inlined]
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:359
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1980
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:2246 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2239 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_typeparams at ./show.jl:640
show_datatype at ./show.jl:1011
show_datatype at ./show.jl:989 [inlined]
_show_type at ./show.jl:889
jfptr__show_type_21105.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show at ./show.jl:881
jfptr_show_35734.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#sprint#426 at ./strings/io.jl:112
unknown function (ip: 0x15553e5e5490)
sprint##kw at ./strings/io.jl:108 [inlined]
#print_type_stacktrace#485 at ./show.jl:2399
print_type_stacktrace at ./show.jl:2399
unknown function (ip: 0x15553e5e51c1)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#show_tuple_as_call#484 at ./show.jl:2380
show_tuple_as_call##kw at ./show.jl:2353
unknown function (ip: 0x15553e5e8094)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_spec_linfo at ./stacktraces.jl:244
print_stackframe at ./errorshow.jl:709
print_stackframe at ./errorshow.jl:685
#show_full_backtrace#834 at ./errorshow.jl:574
show_full_backtrace##kw at ./errorshow.jl:565 [inlined]
show_backtrace at ./errorshow.jl:769
#showerror#813 at ./errorshow.jl:90
showerror##kw at ./errorshow.jl:87
unknown function (ip: 0x15553e5e07b4)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
show_exception_stack at ./errorshow.jl:866
unknown function (ip: 0x15553e5d9371)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
display_error at ./client.jl:104
unknown function (ip: 0x15553e5d8851)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
display_error at ./client.jl:107
unknown function (ip: 0x15553e5d830d)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
exec_options at ./client.jl:294
_start at ./client.jl:495
jfptr__start_40531.clone_1 at /ifs/opt/julia/1.7.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
ERROR: LoadError: SystemError: ftruncate: Stale file handle
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base ./error.jl:174
  [2] #systemerror#68
    @ ./error.jl:173 [inlined]
  [3] systemerror
    @ ./error.jl:173 [inlined]
  [4] grow
    @ ~/.julia/packages/JLD2/k9Gt0/src/mmapio.jl:131 [inlined]
  [5] resize!(io::JLD2.MmapIO, newend::Ptr{Nothing})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/mmapio.jl:147
  [6] seek
    @ ~/.julia/packages/JLD2/k9Gt0/src/mmapio.jl:246 [inlined]
  [7] save_group(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/groups.jl:466
  [8] save_group(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/groups.jl:448
  [9] close(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/JLD2.jl:431
 [10] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:compress, :iotype), Tuple{Bool, DataType}}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:6
 [11] #jldsave#57
    @ ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:243 [inlined]
 [12] top-level scope
    @ ~/mycode/myfile1.jl:143.jl:143
in expression starting at /home/simpson/mycode/myfile1.jl:143

caused by: SystemError: msync: Stale file handle
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base ./error.jl:174
  [2] #systemerror#68
    @ ./error.jl:173 [inlined]
  [3] systemerror
    @ ./error.jl:173 [inlined]
  [4] msync
    @ ~/.julia/packages/JLD2/k9Gt0/src/mmapio.jl:57 [inlined]
  [5] msync
    @ ~/.julia/packages/JLD2/k9Gt0/src/mmapio.jl:54 [inlined]
  [6] raw_write(io::JLD2.MmapIO, ptr::Ptr{UInt8}, nb::Int64)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/dataio.jl:107
  [7] write_data
    @ ~/.julia/packages/JLD2/k9Gt0/src/dataio.jl:137 [inlined]
  [8] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.WriteDataspace{1, Tuple{}}, datatype::JLD2.FixedPointDatatype, odr::Type{Int64}, data::Vector{Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:411
  [9] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Vector{Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/inlineunion.jl:44
 [10] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Vector{Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/inlineunion.jl:36
 [11] write_ref_mutable
    @ ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:525 [inlined]
 [12] write_ref
    @ ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:533 [inlined]
 [13] h5convert!
    @ ~/.julia/packages/JLD2/k9Gt0/src/data/writing_datatypes.jl:288 [inlined]
 [14] macro expansion
    @ ~/.julia/packages/JLD2/k9Gt0/src/data/writing_datatypes.jl:227 [inlined]
 [15] h5convert!(out::JLD2.IndirectPointer, #unused#::JLD2.OnDiskRepresentation{(0, 8, 16, 24, 32), Tuple{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Float64}}, Tuple{Int64, Int64, JLD2.RelOffset, JLD2.RelOffset, JLD2.RelOffset}}, file::JLD2.JLDFile{JLD2.MmapIO}, x::SparseArrays.SparseMatrixCSC{Float64, Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/data/writing_datatypes.jl:227
 [16] write_data(io::JLD2.MmapIO, f::JLD2.JLDFile{JLD2.MmapIO}, data::SparseArrays.SparseMatrixCSC{Float64, Int64}, odr::JLD2.OnDiskRepresentation{(0, 8, 16, 24, 32), Tuple{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Float64}}, Tuple{Int64, Int64, JLD2.RelOffset, JLD2.RelOffset, JLD2.RelOffset}}, #unused#::JLD2.HasReferences, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/dataio.jl:96
 [17] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.WriteDataspace{0, Tuple{}}, datatype::JLD2.CommittedDatatype, odr::JLD2.OnDiskRepresentation{(0, 8, 16, 24, 32), Tuple{Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{Float64}}, Tuple{Int64, Int64, JLD2.RelOffset, JLD2.RelOffset, JLD2.RelOffset}}, data::SparseArrays.SparseMatrixCSC{Float64, Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:452
 [18] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::SparseArrays.SparseMatrixCSC{Float64, Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:520
 [19] write(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String, obj::SparseArrays.SparseMatrixCSC{Float64, Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}; compress::Nothing)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:87
 [20] #write#87
    @ ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:71 [inlined]
 [21] write(f::JLD2.JLDFile{JLD2.MmapIO}, name::String, obj::SparseArrays.SparseMatrixCSC{Float64, Int64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:71
 [22] (::JLD2.var"#58#59"{Base.Pairs{Symbol, Any, NTuple{24, Symbol}, NamedTuple{(:x0, :xA, :T, :β, :n_we_steps, :ΔT_recycle, :Δt, :ΔT_coarse, :r_voronoi, :nΔt_coarse, :nΔt_recycle, :nΔt, :X0_micro_vals, :K̃, :F_target, :h_vec, :v²_vec, :n_samples_per_micro_bin, :n_particles, :n_samples, :we_mean_vals, :f_we_mean, :f2_we_mean, :f_we_var), Tuple{Vector{Float64}, Vector{Float64}, Int64, Float64, Int64, Float64, Float64, Float64, Vector{Vector{Float64}}, Int64, Int64, Int64, Vector{Vector{Float64}}, SparseArrays.SparseMatrixCSC{Float64, Int64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Int64, Int64, Int64, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}}}}})(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:246
 [23] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:compress, :iotype), Tuple{Bool, DataType}}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:4
 [24] #jldsave#57
    @ ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:243 [inlined]
 [25] top-level scope
    @ ~/mycode/myfile1.jl:143
JonasIsensee commented 2 years ago

Hi @gideonsimpson ,

looking through your stacktrace I see that the main error is a stale file handle. ERROR: LoadError: SystemError: ftruncate: Stale file handle

I suspect that this has nothing to do with JLD2 but instead with the file. I've seen similar issues when 1) the nfs file server had a hickup causing file handle to go stale 2) I accidentally moved / overwrote / deleted the files or containing folders of the file while it was open

gideonsimpson commented 2 years ago

That could be it. I ran the job again and it worked without a problem. I'll close out this issue.