bhftbootcamp / Serde.jl

Serde is a Julia library for (de)serializing data to/from various formats. The library offers a simple and concise API for defining custom (de)serialization behavior for user-defined types
Apache License 2.0
31 stars 7 forks source link

Deserialization of abstract types having multiple concrete subtypes #37

Closed liuyxpp closed 2 months ago

liuyxpp commented 2 months ago

Say we have defined:

abstract type A end
struct B <: A
    b
end
struct C <: A
    c
end

struct Config
    x::A
end

config = Config(B(1))

As the type name is not serialized, in deserialization how do we know the type is B or C?

dmitrii-doronin commented 2 months ago

Hi, @liuyxpp! I think there's no way to know it without some sort of introspection on the user's behalf. You could try something like this:

struct Config{T} where {T <: A}
    x::T
end

Serde.deser(Config{B}, dict)

Other option would be to have a custom deser function like this:

Serde.deser(::Type{A}, x::Dict) = Serde.deser(haskey(x, "b") ? A : B, x) 
liuyxpp commented 2 months ago

The first approach is not applicable when we just have the serialized file but do not know which concrete type to deserialize. The second approach does not work when B and C have identical fields.

dmitrii-doronin commented 2 months ago

Well, then it's for the user to decide what structure to deserialise it to. Honestly, I don't get what you're trying to achieve here. If A and B have the same fields then how should Serde know which structure to deserialise it to? I mean you could try something out with NamedTuples but would not that defeat the whole purpose of having structs?

liuyxpp commented 2 months ago

Actually, I have a practical use of this. Currently, the functionality is implemented via JSON3.jl + StructTypes and some homemade ser/deser functions. But I don't like the implementation because it is too fragile. Thus reach out to Serde.jl. But now I realize that It is even impossible to do with Serde.jl.

A brief summary of what I want to do is as follows. I have multiple high freq time series data in multiple days. I developed a number of features to be extract from these data. Say

abstract type AbstractFeature end
struct AFeature <: AbstractFeature end
struct BFeature <: AbstractFeature end
...

There are about 50 features now but more is expected to be added. For one particular project, I have to extract a subset features from the time series data. And I want to store aside which features and their configurations as a JSON file so that I can recover the extraction for newer days (time series data updated for newer days). A typical workflow will be: I perform an initial extraction and save the JSON file along it. Then, some days later, as new data come in, I will do another extraction with the same configuration as the previous extraction for these new data by reading in the JSON file.

dmitrii-doronin commented 2 months ago

We've originally designed Serde to replace JSON3 and StructTypes, so it should cover most of the cases without any issues. Do you have that code openourced somewhere? Maybe you're willing to share a snippet or a more concrete example I could take a look at?

dmitrii-doronin commented 2 months ago

Hi, @liuyxpp. Would you mind providing more info on the issue or I can close it?

gryumov commented 2 months ago

@liuyxpp Hi, did you intend to use something like a delegate?

using Serde

abstract type AbstractFoo end

struct Data{A<:AbstractFoo}
    id::Int64
    tag::String
    body::A
end

struct Foo1 <: AbstractFoo
    num::Int64
end

struct Foo2 <: AbstractFoo
    str::String
end

my_tag(::Val{:Foo1}) = Foo1
my_tag(::Val{:Foo2}) = Foo2

function my_tag(x)
    haskey(x, "tag") || throw(Serde.TagError("tag"))
    my_tag(Val(Symbol(x["tag"])))
end

# deser

h1 = " {\"body\":{\"num\":100},\"tag\":\"Foo1\",\"id\":100} "
h2 = " {\"body\":{\"str\":\"test\"},\"tag\":\"Foo2\",\"id\":\"100\"} "
h3 = " {\"body\":{\"str\":\"test\"},\"tag\":\"Foo1\",\"id\":\"100\"} "
h4 = " {\"body\":{\"str\":\"test\"},\"id\":\"100\"} "

julia> Serde.deser_json(x -> Data{my_tag(x)}, h1)
Data{Foo1}(100, "Foo1", Foo1(100))

julia> Serde.deser_json(x -> Data{my_tag(x)}, h2)
Data{Foo2}(100, "Foo2", Foo2("test"))

julia> Serde.deser_json(x -> Data{my_tag(x)}, h3)
ERROR: ParamError: parameter 'num::Int64' was not passed or has the value 'null'

julia> Serde.deser_json(x -> Data{my_tag(x)}, h4)
ERROR: TagError: tag for method 'tag' is not declared