JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
537 stars 84 forks source link

Make conversion from custom serialization work even when original type cannot be understood. #468

Closed KnutAM closed 1 year ago

KnutAM commented 1 year ago

Attempt for solving: https://github.com/JuliaIO/JLD2.jl/issues/288#issuecomment-1519076407

See example usage in the comment below and temporarily included script files

~It seems to work and tests pass locally, but it feels like this cannot be general.~ ~I never even had to overload the readas function...~

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.49 :tada:

Comparison is base (30dd578) 84.04% compared to head (9411e83) 84.53%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #468 +/- ## ========================================== + Coverage 84.04% 84.53% +0.49% ========================================== Files 31 31 Lines 4142 4145 +3 ========================================== + Hits 3481 3504 +23 + Misses 661 641 -20 ``` | [Impacted Files](https://codecov.io/gh/JuliaIO/JLD2.jl/pull/468?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO) | Coverage Δ | | |---|---|---| | [src/data/custom\_serialization.jl](https://codecov.io/gh/JuliaIO/JLD2.jl/pull/468?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2RhdGEvY3VzdG9tX3NlcmlhbGl6YXRpb24uamw=) | `92.00% <ø> (ø)` | | | [src/data/reconstructing\_datatypes.jl](https://codecov.io/gh/JuliaIO/JLD2.jl/pull/468?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2RhdGEvcmVjb25zdHJ1Y3RpbmdfZGF0YXR5cGVzLmps) | `82.65% <100.00%> (+5.98%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

KnutAM commented 1 year ago

UPDATED: There were some mistakes in the example so it wasn't really relevant. With the new changes, the approach seems to work as expected. Specifically, when readas hasn't been defined by the user, there should be no change to master.

The "includeme.jl" file:

using JLD2

# Some way to define an unknown function
struct UndefinedFunction <:Function
    fun
end
(f::UndefinedFunction)(args...; kwargs...) = throw(ErrorException("The function $(f.fun) is not defined"))

# Foo has all required definitions to be reloaded back to an unknown function
struct Foo{F<:Function}
    fun::F
end
struct FooSerialization
    fun
end

JLD2.readas(::Type{<:FooSerialization}) = Foo
JLD2.writeas(::Type{<:Foo}) = FooSerialization
Base.convert(::Type{<:FooSerialization}, f::Foo) = FooSerialization(f.fun)
function Base.convert(::Type{<:Foo}, f::FooSerialization)
    isa(f.fun, Function) && return Foo(f.fun)
    return Foo(UndefinedFunction(f.fun))
end

# We do not define JLD2.readas for Bar: It should remain unconverted
struct Bar{F<:Function}
    fun::F
end
struct BarSerialization
    fun
end

JLD2.writeas(::Type{<:Bar}) = BarSerialization
Base.convert(::Type{BarSerialization}, f::Bar) = BarSerialization(f.fun)
Base.convert(::Type{Bar}, f::BarSerialization) = Bar(f.fun)
function Base.convert(::Type{<:Bar}, f::BarSerialization)
    isa(f.fun, Function) && return Bar(f.fun)
    return Bar(UndefinedFunction(f.fun))
end

Session 1:

include("includeme.jl")
jldsave("sin.jld2"; foo=Foo(sin))
jldsave("tmp.jld2"; foo=Foo(x->x^2))
jldsave("bar.jld2"; bar=Bar(x->x^2))

Session 2:

include("includeme.jl")
s = jldopen("sin.jld2", "r"); ss=s["foo"]; close(s); ss
a = jldopen("tmp.jld2", "r"); aa=a["foo"]; close(a); aa
b = jldopen("bar.jld2", "r"); bb=b["bar"]; close(b); bb
@show ss
@show aa
@show bb

In the REPL after session 2:

julia> include("session2.jl");
┌ Warning: type Main.#2#3 does not exist in workspace; reconstructing
└ @ JLD2 C:\Users\meyer\.julia\dev\JLD2\src\data\reconstructing_datatypes.jl:411
┌ Warning: custom serialization of Bar{Main.#4#5} encountered, but the type does not exist in the workspace; the data will be read unconverted
└ @ JLD2 C:\Users\meyer\.julia\dev\JLD2\src\data\reconstructing_datatypes.jl:70
┌ Warning: type Main.#4#5 does not exist in workspace; reconstructing
└ @ JLD2 C:\Users\meyer\.julia\dev\JLD2\src\data\reconstructing_datatypes.jl:411
ss = Foo{typeof(sin)}(sin)
aa = Foo{UndefinedFunction}(UndefinedFunction(JLD2.ReconstructedTypes.var"##Main.#2#3#314"()))
bb = BarSerialization(JLD2.ReconstructedTypes.var"##Main.#4#5#315"())
JonasIsensee commented 1 year ago

this looks good to me. Thanks for your work! :)

I'll say again that this probably shouldn't be a permanent solution but it also doesn't look like it would harm anything. If you would like, you're welcome to bump the patch version, and add a changelog. If not, I'll get around to it in a few days.

KnutAM commented 1 year ago

Great! I'll try to add some tests and do that when I have some time.

Would you like me to also add some documentation explaining how/when to use this feature?

JonasIsensee commented 1 year ago

Great! I'll try to add some tests and do that when I have some time.

Would you like me to also add some documentation explaining how/when to use this feature?

That would be excellent! But please do put an Experimental in front ;)

KnutAM commented 1 year ago

I don't understand why the invalidation fails, the only diff from passing to failing is using a type-assert https://github.com/JuliaIO/JLD2.jl/compare/21cf579..18bfa02#diff-4ff9a5fb7264e7d305d989c07cc06a047e83cbcf591f75d8bb675558ea216970R59 ?

Nightly seems unrelated to me and fails on master too.

JonasIsensee commented 1 year ago

I don't understand why the invalidation fails, the only diff from passing to failing is using a type-assert https://github.com/JuliaIO/JLD2.jl/compare/21cf579..18bfa02#diff-4ff9a5fb7264e7d305d989c07cc06a047e83cbcf591f75d8bb675558ea216970R59 ?

Nightly seems unrelated to me and fails on master too.

To me, it looks more like an unrelated error in the CI run rather than an actual invalidation change. I've restarted the job.

KnutAM commented 1 year ago

I've restarted the job.

Great!

KnutAM commented 1 year ago

For future reference:

Not intending any of this to be updated in this pr, just a note of what is possible with this pr.

After tip from @fredrikekre I tried using Serialization from stdlib to write/save the function, and that seems to work!

With the readas function, one can do

using Serialization

struct Foo{F<:Function}
    fun::F
end
struct FooSerialization
    io
end

JLD2.writeas(::Type{<:Foo}) = FooSerialization
JLD2.readas(::Type{<:FooSerialization}) = Foo

function Base.convert(::Type{<:FooSerialization}, f::Foo)
    io = IOBuffer()
    serialize(io, f.fun)
    return FooSerialization(io)
end

function Base.convert(::Type{<:Foo}, f::FooSerialization)
    seek(f.io, 0)
    fun = deserialize(f.io)
    return Foo(fun)
end

saving as jldsave("foo.jld2"; foo=Foo(x->x^2)) and reading gives

getfoo(file) = jldopen(file) do io; io["foo"]; end
foo=getfoo("foo.jld2")
# Output: Foo{Serialization.__deserialized_types__.var"#11#12"}(Serialization.__deserialized_types__.var"#11#12"())
foo.fun
# Output: #11 (generic function with 1 method)
foo.fun(2)
# Output: 4

Not tested a lot, but could potentially provide workarounds for some of #208, #175, #36, and #37 (And if there is a failure, it should occur during convert in loading, so that when reopening the file this function can be redefined to handle potential issues with deserialization.

KnutAM commented 1 year ago

@JonasIsensee do you want any more changes before this can be merged?

JonasIsensee commented 1 year ago

I'm sorry for leaving this unanswered for so long. Was busy and forgot.

Thank you for your contribution!