JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
549 stars 85 forks source link

Error when trying to save large data set with compression #396

Closed Patbott closed 1 year ago

Patbott commented 2 years ago

I'm trying to save a large data set using jldsave with the compression argument set to true, but it gives me an error. For example, when running the following code:

using JLD2
X = randn(150, 7*10^6)
jldsave("test.jld2", true; X = X)

it throws the error

ERROR: InexactError: trunc(UInt32, 8400000000)
Stacktrace:
  [1] throw_inexacterror(f::Symbol, #unused#::Type{UInt32}, val::UInt64)
    @ Core ./boot.jl:612
  [2] checked_trunc_uint
    @ ./boot.jl:642 [inlined]
  [3] toUInt32
    @ ./boot.jl:731 [inlined]
  [4] UInt32
    @ ./boot.jl:766 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] setproperty!
    @ ./Base.jl:43 [inlined]
  [7] process(codec::CodecZlib.ZlibCompressor, input::TranscodingStreams.Memory, output::TranscodingStreams.Memory, error::TranscodingStreams.Error)
    @ CodecZlib ~/.julia/packages/CodecZlib/ruMLE/src/compression.jl:172
  [8] transcode(codec::CodecZlib.ZlibCompressor, data::Vector{UInt8})
    @ TranscodingStreams ~/.julia/packages/TranscodingStreams/IVlnc/src/transcode.jl:90
  [9] deflate_data(f::JLD2.JLDFile{JLD2.MmapIO}, data::Matrix{Float64}, odr::Type{Float64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compressor::CodecZlib.ZlibCompressor)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:146
 [10] write_compressed_data(cio::JLD2.MmapIO, f::JLD2.JLDFile{JLD2.MmapIO}, data::Matrix{Float64}, odr::Type, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, filter_id::UInt16, compressor::CodecZlib.ZlibCompressor)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:182
 [11] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.WriteDataspace{2, Tuple{}}, datatype::JLD2.FloatingPointDatatype, odr::Type{Float64}, data::Matrix{Float64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/datasets.jl:404
 [12] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Matrix{Float64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/inlineunion.jl:44
 [13] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Matrix{Float64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/inlineunion.jl:36
 [14] write(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String, obj::Matrix{Float64}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}; compress::Nothing)
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:87
 [15] #write#87
    @ ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:71 [inlined]
 [16] write
    @ ~/.julia/packages/JLD2/k9Gt0/src/compression.jl:71 [inlined]
 [17] (::JLD2.var"#58#59"{Base.Pairs{Symbol, Matrix{Float64}, Tuple{Symbol}, NamedTuple{(:X,), Tuple{Matrix{Float64}}}}})(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:246
 [18] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:compress, :iotype), Tuple{Bool, DataType}}})
    @ JLD2 ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:4
 [19] #jldsave#57
    @ ~/.julia/packages/JLD2/k9Gt0/src/loadsave.jl:243 [inlined]
 [20] top-level scope
    @ REPL[2]:1

Saving without compression works fine (so jldsave("test.jld2", false; X = X) gives no issues)

JonasIsensee commented 2 years ago

Hi @Patbott , thanks for reporting this. tbh, I'm not sure why this is happening. It's not really a problem with JLD2 since stacktrace clearly shows the error is produced by the compression library CodecZlib.

BioTurboNick commented 1 year ago

Duplicate of #399 ?

felixhorger commented 1 year ago

Yes it's the same error I suppose. It should have been solved here. Let me know if it still doesn't work please!

bjarthur commented 1 year ago

@felixhorger it still doesn't work. i can reproduce the OP's error:

julia> using JLD2

julia> X = randn(150, 7*10^6);

julia> jldsave("test.jld2", true; X = X)
[ Info: Attempting to dynamically load CodecZlib

julia> load("test.jld2")
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2}, String}("test.jld2").

Fatal error:
ERROR: InexactError: trunc(UInt32, 8069638866)
Stacktrace:
  [1] throw_inexacterror(f::Symbol, #unused#::Type{UInt32}, val::UInt64)
    @ Core ./boot.jl:614
  [2] checked_trunc_uint
    @ ./boot.jl:644 [inlined]
  [3] toUInt32
    @ ./boot.jl:733 [inlined]
  [4] UInt32
    @ ./boot.jl:768 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] setproperty!
    @ ./Base.jl:39 [inlined]
  [7] process(codec::CodecZlib.ZlibDecompressor, input::TranscodingStreams.Memory, output::TranscodingStreams.Memory, error::TranscodingStreams.Error)
    @ CodecZlib /groups/scicompsoft/home/arthurb/.julia/packages/CodecZlib/ytMgl/src/decompression.jl:160
  [8] unsafe_transcode!(output::TranscodingStreams.Buffer, codec::CodecZlib.ZlibDecompressor, input::TranscodingStreams.Buffer)
    @ TranscodingStreams /groups/scicompsoft/home/arthurb/.julia/packages/TranscodingStreams/2McN2/src/transcode.jl:152
  [9] transcode!
    @ /groups/scicompsoft/home/arthurb/.julia/packages/TranscodingStreams/2McN2/src/transcode.jl:127 [inlined]
 [10] transcode(codec::CodecZlib.ZlibDecompressor, input::TranscodingStreams.Buffer, output::Nothing)
    @ TranscodingStreams /groups/scicompsoft/home/arthurb/.julia/packages/TranscodingStreams/2McN2/src/transcode.jl:109
 [11] transcode
    @ /groups/scicompsoft/home/arthurb/.julia/packages/TranscodingStreams/2McN2/src/transcode.jl:108 [inlined]
 [12] transcode
    @ /groups/scicompsoft/home/arthurb/.julia/packages/TranscodingStreams/2McN2/src/transcode.jl:189 [inlined]
 [13] decompress!(inptr::Ptr{Nothing}, data_length::Int64, element_size::Int64, n::Int64, decompressor::CodecZlib.ZlibDecompressor)
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/compression.jl:254
 [14] read_compressed_array!(v::Matrix{Float64}, f::JLD2.JLDFile{JLD2.MmapIO}, rr::JLD2.ReadRepresentation{Float64, Float64}, data_length::Int64, filters::JLD2.FilterPipeline)
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/compression.jl:293
 [15] read_array(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.ReadDataspace, rr::JLD2.ReadRepresentation{Float64, Float64}, layout::JLD2.DataLayout, filters::JLD2.FilterPipeline, header_offset::JLD2.RelOffset, attributes::Vector{JLD2.ReadAttribute})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datasets.jl:408
 [16] read_data(f::JLD2.JLDFile{JLD2.MmapIO}, rr::Any, read_dataspace::Tuple{JLD2.ReadDataspace, JLD2.RelOffset, JLD2.DataLayout, JLD2.FilterPipeline}, attributes::Vector{JLD2.ReadAttribute})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datasets.jl:240
 [17] macro expansion
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datasets.jl:224 [inlined]
 [18] macro expansion
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datatypes.jl:103 [inlined]
 [19] read_data(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.ReadDataspace, datatype_class::UInt8, datatype_offset::Int64, layout::JLD2.DataLayout, filters::JLD2.FilterPipeline, header_offset::JLD2.RelOffset, attributes::Vector{JLD2.ReadAttribute})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datasets.jl:211
 [20] load_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, offset::JLD2.RelOffset)
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/datasets.jl:125
 [21] getindex(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String)
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/groups.jl:109
 [22] getindex
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/JLD2.jl:461 [inlined]
 [23] loadtodict!(d::Dict{String, Any}, g::JLD2.JLDFile{JLD2.MmapIO}, prefix::String)
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/loadsave.jl:154
 [24] loadtodict!
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/loadsave.jl:153 [inlined]
 [25] (::JLD2.var"#100#101")(file::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/fileio.jl:39
 [26] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/loadsave.jl:4
 [27] jldopen
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/loadsave.jl:1 [inlined]
 [28] #fileio_load#99
    @ /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/fileio.jl:38 [inlined]
 [29] fileio_load(f::FileIO.File{FileIO.DataFormat{:JLD2}, String})
    @ JLD2 /groups/scicompsoft/home/arthurb/.julia/packages/JLD2/ryhNR/src/fileio.jl:37
 [30] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [31] invokelatest
    @ ./essentials.jl:726 [inlined]
 [32] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:219
 [33] action
    @ /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:196 [inlined]
 [34] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:185
 [35] action
    @ /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:185 [inlined]
 [36] load(::String; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:113
 [37] load(::String)
    @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:109
 [38] top-level scope
    @ REPL[4]:1
Stacktrace:
 [1] handle_error(e::InexactError, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/error_handling.jl:61
 [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/error_handling.jl:56
 [3] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:228
 [4] action
   @ /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:196 [inlined]
 [5] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:185
 [6] action
   @ /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:185 [inlined]
 [7] load(::String; options::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:113
 [8] load(::String)
   @ FileIO /groups/scicompsoft/home/arthurb/.julia/packages/FileIO/BE7iZ/src/loadsave.jl:109
 [9] top-level scope
   @ REPL[4]:1

(shroff) pkg> st
Status `/groups/scicompsoft/home/arthurb/projects/shroff/Project.toml`
  [944b1d66] CodecZlib v0.7.1
  [033835bb] JLD2 v0.4.31
Info Packages marked with ⌃ have new versions available and may be upgradable.

julia> VERSION
v"1.8.5"
bjarthur commented 1 year ago

CodecBzip2 has the same problem:

julia> jldsave("test.jld2", Bzip2Compressor(); X = X)
ERROR: InexactError: trunc(UInt32, 8400000000)
Stacktrace:
  [1] throw_inexacterror(f::Symbol, #unused#::Type{UInt32}, val::UInt64)
    @ Core ./boot.jl:614

julia> Sys.MACHINE
"x86_64-linux-gnu"
bjarthur commented 1 year ago

ahah, wait, to clarify, the error with CodecZlib occurs when trying to read it back in, whereas the error with CodecBzip2 is during the write.

bjarthur commented 1 year ago

should be fixed by https://github.com/JuliaIO/CodecZlib.jl/pull/69

felixhorger commented 1 year ago

ahah, wait, to clarify, the error with CodecZlib occurs when trying to read it back in, whereas the error with CodecBzip2 is during the write.

Yeah I just noticed, I was only concerned with writing, but all the data I tested it with was smaller than the limit (after compression) so didn't cause that problem you found. Thanks for fixing!

Sidenote for the interested: (losslessly) compressing anything random won't do much, you can save yourself some time not using compression