JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
546 stars 85 forks source link

Error when loading jld2 file with closure when loading a second time #288

Closed vancleve closed 2 years ago

vancleve commented 3 years ago

MWE:

              _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
shell> cat test.jl
struct A
    x::Int
    func::Function

    function A(x)
        return new(x, (y)-> y * exp(x))
    end
end

julia> using JLD2; include("test.jl")

julia> a = A(1)
A(1, var"#3#4"{Int64}(1))

julia> @save "test.jld2" a
              _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
julia> using JLD2; include("test.jl")

julia> @load "test.jld2"
1-element Array{Symbol,1}:
 :a
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
julia> using JLD2

julia> @load "test.jld2"
┌ Warning: type Main.A does not exist in workspace; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/pA6G3/src/data/reconstructing_datatypes.jl:450
┌ Warning: read type var"#3#4"{Int64} was parametrized, but type var"#3#4" in workspace is not; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/pA6G3/src/data/reconstructing_datatypes.jl:393
1-element Array{Symbol,1}:
 :a

julia> include("test.jl")

julia> @load "test.jld2"
┌ Warning: read type var"#3#4"{Int64} was parametrized, but type var"#3#4" in workspace is not; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/pA6G3/src/data/reconstructing_datatypes.jl:393
ERROR: MethodError: Cannot `convert` an object of type JLD2.ReconstructedTypes.var"##var\"#3#4\"{Int64}#255" to an object of type Function
Closest candidates are:
  convert(::Type{T}, ::T) where T at essentials.jl:171
Stacktrace:
 [1] jlconvert at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/data/writing_datatypes.jl:302 [inlined]
 [2] macro expansion at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/data/reconstructing_datatypes.jl:557 [inlined]
 [3] jlconvert(::JLD2.ReadRepresentation{A,JLD2.OnDiskRepresentation{(0, 8),Tuple{Int64,Function},Tuple{Int64,JLD2.RelOffset}}()}, ::JLD2.JLDFile{JLD2.MmapIO}, ::Ptr{Nothing}, ::JLD2.RelOffset) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/data/reconstructing_datatypes.jl:502
 [4] read_scalar at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/dataio.jl:37 [inlined]
 [5] read_data(::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadRepresentation{A,JLD2.OnDiskRepresentation{(0, 8),Tuple{Int64,Function},Tuple{Int64,JLD2.RelOffset}}()}, ::Tuple{JLD2.ReadDataspace,JLD2.RelOffset,Int64,UInt16}, ::Array{JLD2.ReadAttribute,1}) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/datasets.jl:170
 [6] read_data(::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.ReadDataspace, ::UInt8, ::Int64, ::Int64, ::Int64, ::UInt16, ::JLD2.RelOffset, ::Array{JLD2.ReadAttribute,1}) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/datasets.jl:149
 [7] load_dataset(::JLD2.JLDFile{JLD2.MmapIO}, ::JLD2.RelOffset) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/datasets.jl:92
 [8] getindex(::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, ::String) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/groups.jl:108
 [9] read(::JLD2.JLDFile{JLD2.MmapIO}, ::String) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/JLD2.jl:346
 [10] #7 at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/loadsave.jl:145 [inlined]
 [11] jldopen(::var"#7#8", ::String; kws::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/loadsave.jl:4
 [12] jldopen(::Function, ::String) at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/loadsave.jl:2
 [13] top-level scope at /Users/vancleve/.julia/packages/JLD2/pA6G3/src/loadsave.jl:144
JonasIsensee commented 3 years ago

Hi, this has been discussed a few times before. I'd love to implement this but honestly, I just don't know how...

208

175

36

37

vancleve commented 3 years ago

Yes, understand that JLD2 won't correctly save closures and that's totally fine.

But what I'm wondering is if we can have JLD2 still reliably open those files that do have saved closures. This example shows a case where JLD2 will reconstruct the object just fine (minus the closure) without the source code loaded but then balks after the source code is loaded.

I've seen this occur more in the case where I've made changes in the source since saving the .jld2 file (though not changes in the object definition so far as I can tell) and the .jld2 won't load due to this Cannot convert...to an object of type Function error.

So is there a way to force JLD2 to load these files? Like maybe dump these fields that it cannot convert? The vector/scalar data is often what we need most, say the simulation runs, and the data it cannot convert, the closures, are the functions used to generate the data, which may already be saved in the original source code that ran the sims.

JonasIsensee commented 3 years ago

Ah, thank you. I had missed that the first time. Yes, I thank it should be possible to return a normal ReconstructedType instead of an error.

JonasIsensee commented 2 years ago

A more explicit way forward could be to pass a type remapping for the closure.

KnutAM commented 1 year ago

Running the mwe above I have the same issue (that an error is thrown) (JLD2 v0.4.31)

versioninfo ```julia Julia Version 1.8.5 Commit 17cfb8e65e (2023-01-08 06:45 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake) Threads: 1 on 8 virtual cores Environment: JULIA_PKG_OFFLINE = false JULIA_PKG_USE_CLI_GIT = true ```

I tried to get around it by doing custom serialization,

using JLD2
struct Foo{F<:Function}
    fun::F
end
struct FooSerialization
    fun
end
struct UndefinedFunction <:Function
    fun
end
(f::UndefinedFunction)(args...; kwargs...) = throw(ErrorException("The function $(f.fun) is not defined"))

JLD2.writeas(::Type{<:Foo}) = FooSerialization
Base.convert(::Type{<:FooSerialization}, f::Foo) = FooSerialization(f.fun)
function Base.convert(::Type{<:Foo}, f::FooSerialization)
    isa(f.fun, Function) && return Foo(f.fun)
    return Foo(UndefinedFunction(f.fun))
end

Session 1

include("mwe.jl")
jldsave("tmp.jld2"; foo=Foo(x->x^2))

Session 2

include("mwe.jl")
d = jldopen("tmp.jld2", "r"); f=d["foo"]; close(d); f
┌ Warning: custom serialization of Foo{Main.#9#10} encountered, but the type does not exist in the workspace; the data will be read unconverted
└ @ JLD2 C:\Users\meyer\.julia\packages\JLD2\ryhNR\src\data\reconstructing_datatypes.jl:62   
┌ Warning: type Main.#9#10 does not exist in workspace; reconstructing
└ @ JLD2 C:\Users\meyer\.julia\packages\JLD2\ryhNR\src\data\reconstructing_datatypes.jl:403  
FooSerialization(JLD2.ReconstructedTypes.var"##Main.#9#10#312"())

I don't understand how to make it try to convert.

Calling convert(Foo, f) after this works as expected, but is it possible to make it work during reconstruction?

JonasIsensee commented 1 year ago

Hi @KnutAM,

I'm not sure, I have a satisfactory answer available but I can explain what the problem is.

When you use CustomSerialization, JLD2 stores three things:

The last bit is the problem. It cannot do the conversion because it cannot create the datatype it needs Foo{Main.#9#10}. (Since functions are their own weird type)

KnutAM commented 1 year ago

Thanks for the quick and good response @JonasIsensee !

I'm not sure if I fully understand all the checks in the code, but would it be possible to provide some overload mechanism like

JLD2.readas(::Type{<:ASerialization}) = A

to solve this problem?

And if such a readas has been defined, call that instead of the current read_attr_data(f, julia_type_attr) in the following place:

https://github.com/JuliaIO/JLD2.jl/blob/30dd57839159945fd3d17886891fe30b19367703/src/data/reconstructing_datatypes.jl#L56-L60

JonasIsensee commented 1 year ago

That is an interesting idea and it could possibly even work. You definitely found the right place to edit and I'd be interested to see if you can get it to work with that approach. (Please do open a new issue, for further comments. The above is a bit different.)

Viewing the bigger perspective, I believe it would be good to improve how JLD2 reconstructs Datatypes but in a way that keeps in in the data domain for longer before lifting it to the type domain..