JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
547 stars 85 forks source link

Error saving DataFrame in 1.8.3 #441

Closed grahamas closed 1 year ago

grahamas commented 1 year ago

I'll play around with making an MWE. Let me know if the errors suggest a best direction in which to simplify.

Is this related to #427?

Julia 1.8.3; JLD2 0.4.28

The same code works fine in Julia 1.7.3.

julia> sensitivity_df
8×6 DataFrame
 Row │ target   standardization  signal   auc       tpr57     fphr80   
     │ String   String           String   Float64   Float64   Float64  
─────┼─────────────────────────────────────────────────────────────────
   1 │ patient  across           aEEG     0.939457  0.820513   5.51892
   2 │ patient  across           tricorr  0.92449   0.794872   6.32169
   3 │ patient  within           aEEG     0.959862  0.923077   2.88872
   4 │ patient  within           tricorr  0.978086  1.0        2.08481
   5 │ seizure  across           aEEG     0.755895  0.484848  26.9989
   6 │ seizure  across           tricorr  0.841416  0.583333  15.9948
   7 │ seizure  within           aEEG     0.834016  0.598485  15.3765
   8 │ seizure  within           tricorr  0.808684  0.522727  20.7921

julia> save("~/test.jld2", Dict("sensitivity_df" => sensitivity_df))
Error encountered while save FileIO.File{FileIO.DataFormat{:JLD2}, String}("~/test.jld2").

Fatal error:
ERROR: StackOverflowError:
Stacktrace:
 [1] h5fieldtype(f::JLD2.JLDFile{JLD2.MmapIO}, writeas::Type{Int64}, readas::Type, init::Type{Val{true}})
   @ JLD2 ~/.julia/packages/JLD2/HnW0g/src/data/number_types.jl:102
 [2] h5fieldtype(f::JLD2.JLDFile{JLD2.MmapIO}, writeas::Type{Int64}, readas::Type, init::Type{Val{true}}) (repeats 99 times)
   @ JLD2 ~/.julia/packages/JLD2/HnW0g/src/data/number_types.jl:104
Stacktrace:
 [1] handle_error(e::StackOverflowError, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
   @ FileIO ~/.julia/packages/FileIO/aP78L/src/error_handling.jl:61
 [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
   @ FileIO ~/.julia/packages/FileIO/aP78L/src/error_handling.jl:56
 [3] action(call::Symbol, libraries::Vector{Union{Base.PkgId, Module}}, file::FileIO.Formatted, args::Dict{String, DataFrame}; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:228
 [4] action
   @ ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:196 [inlined]
 [5] action(call::Symbol, libraries::Vector{Union{Base.PkgId, Module}}, sym::Symbol, file::String, args::Dict{String, DataFrame}; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:185
 [6] action
   @ ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:185 [inlined]
 [7] #save#20
   @ ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:129 [inlined]
 [8] save(file::String, args::Dict{String, DataFrame})
   @ FileIO ~/.julia/packages/FileIO/aP78L/src/loadsave.jl:125
 [9] top-level scope
   @ REPL[19]:1
grahamas commented 1 year ago

I'm somewhat at a loss to make an MWE. Recreating a dataframe with the same structure works just fine (i.e. no error)

julia> df = DataFrame(target=["patient" for _ in 1:8], standardization=["across" for _ in 1:8], signal=["aEEG" for _ in 1:8], auc=randn(8), tpr57=rand(8), fphr80=rand(8))
8×6 DataFrame
 Row │ target   standardization  signal  auc         tpr57      fphr80    
     │ String   String           String  Float64     Float64    Float64   
─────┼────────────────────────────────────────────────────────────────────
   1 │ patient  across           aEEG     0.0593826  0.32419    0.213125
   2 │ patient  across           aEEG     0.356194   0.0443194  0.274227
   3 │ patient  across           aEEG    -0.340227   0.0296562  0.0572772
   4 │ patient  across           aEEG     0.221184   0.147901   0.910407
   5 │ patient  across           aEEG    -1.9993     0.0715698  0.294623
   6 │ patient  across           aEEG     1.82078    0.448308   0.223144
   7 │ patient  across           aEEG     0.0852268  0.57349    0.811184
   8 │ patient  across           aEEG     0.791005   0.22799    0.518066

julia> save("~/test.jld2", Dict("df" => df))
# works fine
JonasIsensee commented 1 year ago

Hi, this is indeed very strange and I habe no idea how it should be possible for this to happen.

Without a way to reproduce the issue, there's nothing i can do.

grahamas commented 1 year ago

Today I can't even replicate the error with the exact same script.