JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
560 stars 92 forks source link

Exception when reconstructing a type inside a collection with a specific type parameter #374

Closed Octogonapus closed 3 months ago

Octogonapus commented 2 years ago

Description

Using Julia v1.6 and JLD2 v0.4.19, JLD2 fails to load a struct containing a vector of a type which must be reconstructed. In principal, this makes sense, as the reconstructed type isn't convertible to the required type on the vector. This behavior isn't very helpful, though, as the exception occurs inside JLD2 code so I don't have an opportunity to address the problem.

My code can provide a way to convert a reconstructed Foo.Inner into a true Foo.Inner. Would it be possible to pass this function to JLD2 somehow? I'm not sure how my conversion method would get the required type information, though. I can see that JLD2 has this information, since it creates type names like this JLD2.ReconstructedTypes.var"##Main.Foo.Inner#258". That is probably enough to implement the conversion function I have in mind. In our own code, we save the type name as a string next to the data, and then use that to convert any reconstructed types into the true types. We need to do this conversion in the first place because we need to dispatch on these types; dispatch won't work with reconstructed types.

I'd also like I note that I can help work on this change, if it's something that would be accepted.

Workaround

I could change the type of Outer.a to Vector, but this not ideal as I want to require a specific eltype on that vector.

I could also define a method of convert like this Base.convert(::Type{Inner}, other) = Inner(other.x, 0), but I can't think of a way to provide a tight enough type bound on other so that this function won't get accidentally called.

MWE

Step 1: Serialize

using JLD2, FileIO

module Foo

struct Inner
    x::Int
end

struct Outer
    a::Vector{Inner}
end

end

save("test.jld2", Dict("a" => Foo.Outer([Foo.Inner(1)])))

Step 2: Deserialize with an Added Field

using JLD2, FileIO

module Foo

struct Inner
    x::Int
    y::Int
end

struct Outer
    a::Vector{Inner}
end

end

load("test.jld2")

Exception

julia> load("test.jld2")
┌ Warning: saved type Main.Foo.Inner is missing field y in workspace type; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/L21uR/src/data/reconstructing_datatypes.jl:152
Error encountered while load File{DataFormat{:JLD2}, String}("test.jld2").

Fatal error:
ERROR: MethodError: Cannot `convert` an object of type JLD2.ReconstructedTypes.var"##Main.Foo.Inner#257" to an object of type Main.Foo.Inner
Closest candidates are:
  convert(::Type{T}, ::T) where T at essentials.jl:205
  Main.Foo.Inner(::Any, ::Any) at REPL[2]:4
Stacktrace:
  [1] setindex!(A::Vector{Main.Foo.Inner}, x::JLD2.ReconstructedTypes.var"##Main.Foo.Inner#257", i1::Int64)
    @ Base ./array.jl:843
  [2] _unsafe_copyto!(dest::Vector{Main.Foo.Inner}, doffs::Int64, src::Vector{JLD2.ReconstructedTypes.var"##Main.Foo.Inner#257"}, soffs::Int64, n::Int64)
    @ Base ./array.jl:235
  [3] unsafe_copyto!
    @ ./array.jl:289 [inlined]
  [4] _copyto_impl!
    @ ./array.jl:313 [inlined]
  [5] copyto!
    @ ./array.jl:299 [inlined]
  [6] copyto!
    @ ./array.jl:325 [inlined]
  [7] copyto_axcheck!
    @ ./abstractarray.jl:1056 [inlined]
  [8] Vector{Main.Foo.Inner}(x::Vector{JLD2.ReconstructedTypes.var"##Main.Foo.Inner#257"})
    @ Base ./array.jl:540
  [9] convert(#unused#::Type{Vector{Main.Foo.Inner}}, a::Vector{JLD2.ReconstructedTypes.var"##Main.Foo.Inner#257"})
    @ Base ./array.jl:532
 [10] jlconvert
    @ ~/.julia/packages/JLD2/L21uR/src/data/writing_datatypes.jl:302 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/JLD2/L21uR/src/data/reconstructing_datatypes.jl:563 [inlined]
 [12] jlconvert(#unused#::JLD2.ReadRepresentation{Main.Foo.Outer, JLD2.OnDiskRepresentation{(0,), Tuple{Vector{Main.Foo.Inner}}, Tuple{JLD2.RelOffset}}()}, f::JLD2.JLDFile{JLD2.MmapIO}, ptr::Ptr{Nothing}, header_offset::JLD2.RelOffset)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/data/reconstructing_datatypes.jl:508
 [13] read_scalar(f::JLD2.JLDFile{JLD2.MmapIO}, rr::JLD2.ReadRepresentation{Main.Foo.Outer, JLD2.OnDiskRepresentation{(0,), Tuple{Vector{Main.Foo.Inner}}, Tuple{JLD2.RelOffset}}()}, header_offset::JLD2.RelOffset)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/dataio.jl:37
 [14] read_data(f::JLD2.JLDFile{JLD2.MmapIO}, rr::Any, read_dataspace::Tuple{JLD2.ReadDataspace, JLD2.RelOffset, Int64, UInt16}, attributes::Vector{JLD2.ReadAttribute})
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/datasets.jl:170
 [15] read_data(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.ReadDataspace, datatype_class::UInt8, datatype_offset::Int64, data_offset::Int64, data_length::Int64, filter_id::UInt16, header_offset::JLD2.RelOffset, attributes::Vector{JLD2.ReadAttribute})
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/datasets.jl:149
 [16] load_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, offset::JLD2.RelOffset)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/datasets.jl:92
 [17] getindex(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/groups.jl:108
 [18] getindex(f::JLD2.JLDFile{JLD2.MmapIO}, name::String)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/JLD2.jl:379
 [19] loadtodict!(d::Dict{String, Any}, g::JLD2.JLDFile{JLD2.MmapIO}, prefix::String)
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/loadsave.jl:154
 [20] loadtodict!
    @ ~/.julia/packages/JLD2/L21uR/src/loadsave.jl:153 [inlined]
 [21] #62
    @ ~/.julia/packages/JLD2/L21uR/src/fileio.jl:39 [inlined]
 [22] jldopen(::JLD2.var"#62#63", ::String, ::Vararg{String, N} where N; kws::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/loadsave.jl:4
 [23] jldopen
    @ ~/.julia/packages/JLD2/L21uR/src/loadsave.jl:2 [inlined]
 [24] #fileio_load#61
    @ ~/.julia/packages/JLD2/L21uR/src/fileio.jl:38 [inlined]
 [25] fileio_load(f::File{DataFormat{:JLD2}, String})
    @ JLD2 ~/.julia/packages/JLD2/L21uR/src/fileio.jl:38
 [26] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [27] invokelatest
    @ ./essentials.jl:706 [inlined]
 [28] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Formatted; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:219
 [29] action
    @ ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:197 [inlined]
 [30] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:185
 [31] action
    @ ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:185 [inlined]
 [32] load(::String; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:113
 [33] load(::String)
    @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:110
 [34] top-level scope
    @ REPL[3]:1
Stacktrace:
 [1] handle_error(e::MethodError, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/error_handling.jl:61
 [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector{T} where T}}, action::String)
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/error_handling.jl:56
 [3] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Formatted; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:228
 [4] action
   @ ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:197 [inlined]
 [5] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:185
 [6] action
   @ ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:185 [inlined]
 [7] load(::String; options::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:113
 [8] load(::String)
   @ FileIO ~/.julia/packages/FileIO/QkYgA/src/loadsave.jl:110
 [9] top-level scope
   @ REPL[3]:1
JonasIsensee commented 2 years ago

Hi @Octogonapus , thanks for reporting this. I've labelled this as a bug.

Defining Base.convert(::Type{Foo.Inner}, x) = Foo.Inner(x.x, 0) works for me. Another workaround is described in #242

Octogonapus commented 2 years ago

I think this could be handled with a look-ahead inside constructrr, but that might be computationally expensive. The type of Outer.a would also need to be serialized properly. Right now, it's just Any instead of Vector{Inner}.

┌ Debug: constructrr
│   T = Main.Foo.Outer
│   dt = JLD2.CompoundDatatype(0x00000008, [:a], [0], JLD2.H5Datatype[JLD2.BasicDatatype(0x37, 0x00, 0x00, 0x00, 0x00000008)])
│   attrs =
│    1-element Vector{JLD2.ReadAttribute}:
│     JLD2.ReadAttribute(:julia_type, JLD2.ReadDataspace(0x00, 0x00, 4891), 0xff, 560, 4891)
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:121
┌ Debug: field_datatypes
│   field_datatypes = JLD2.RelOffset[]
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:123
┌ Debug: isconcretetype(T)
│   isconcretetype(T) = true
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:126
┌ Debug: dtnames & mapped
│   dtnames =
│    Dict{Symbol, Int64} with 1 entry:
│      :a => 1
│   mapped =
│    1-element BitVector:
│     0
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:138
┌ Debug: determine dtrr
│   field_datatypes = JLD2.RelOffset[]
│   dtindex = 1
│   dt.members =
│    1-element Vector{JLD2.H5Datatype}:
│     JLD2.BasicDatatype(0x37, 0x00, 0x00, 0x00, 0x00000008)
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:161
┌ Debug: dtrr
│   dtrr = JLD2.ReadRepresentation{Any, JLD2.RelOffset}()
│   typeof(dtrr) = JLD2.ReadRepresentation{Any, JLD2.RelOffset}
│   (typeof(dtrr)).parameters = svec(Any, JLD2.RelOffset)
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:168
┌ Debug: constructrr type intersection
│   readtype = Any
│   wstype = Vector{Main.Foo.Inner} (alias for Array{Main.Foo.Inner, 1})
│   typeintersect(readtype, wstype) = Vector{Main.Foo.Inner} (alias for Array{Main.Foo.Inner, 1})
│   hasmethod(convert, Tuple{Type{wstype}, readtype}) = false
└ @ JLD2 ~/Documents/code/JLD2.jl/src/data/reconstructing_datatypes.jl:171

If readtype was Vector{Inner} instead of Any, and we knew if reconstructing Inner was required, then we would know to reconstruct Outer. I'm not too familiar with this package's internals, can you think of a way to do that? I can work on implementing it.

I may also work on implementing a "reconstructed types only" feature, as described in #242.

JonasIsensee commented 2 years ago

If readtype was Vector{Inner} instead of Any, and we knew if reconstructing Inner was required, then we would know to reconstruct Outer. I'm not too familiar with this package's internals, can you think of a way to do that? I can work on implementing it.

I've looked at the source code and the files. Sadly I'm pretty sure this is not possible. JLD2 does not store the julia-eltype of structs. It only stores the HDF5 types of fields. It's easy to map this in the simple cases of e.g. floats and ints but arrays are mutable objects and are therefore stored separately and referenced by a RelOffset.

constructrr cannot know the field type of Outer.a - not without a serious reworking of JLD2 internals. (See #316 and #338 for my unfinished attempts)

I believe it may be more useful to implement the explicit type remapping that's already working in #316. If you're interested in contributing, have a look at #316 and what I did with the type remapping. Essentially, every file gets a dictionary that maps datatype names to datatypes in the running session. This already worked nicely but #316 tried to do anonymous functions as well, which I didn't finish.

JonasIsensee commented 2 years ago

Since I had the code lying around already I ended up doing the above myself in #376 . I hope this helps

Octogonapus commented 3 months ago

After revisiting this today, I think this is not a problem on the latest version. As seen in https://github.com/Octogonapus/JLD2Benchmark we can use the Upgrade and rconvert mechanisms to efficiently load into the struct containing the added field.