beacon-biosignals / Legolas.jl

Tables.jl-friendly mechanisms for constructing, reading, writing, and validating Arrow tables against extensible, versioned, user-specified schemas.
Other
24 stars 2 forks source link

Legolas-v0.4 serialized nested rows not deserialized properly on Legolas v0.5 #70

Closed ericphanson closed 1 year ago

ericphanson commented 1 year ago
julia> table = Legolas.read(path)
┌ Warning: The `Tables.Schema` of the `Arrow.Table` read via `Legolas.read(io_or_path)` does not appear to
│ comply with the `Legolas.SchemaVersion` indicated by the table's metadata (`SchemaVersion("digits.model", 1)`). Try invoking
│ `Legolas.read(io_or_path; validate=false)` to inspect the table.
└ @ Legolas ~/.julia/packages/Legolas/C9kX7/src/tables.jl:171
ERROR: ArgumentError: field `config` has unexpected type; expected <:DigitsConfigV1, found NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3")), Tuple{String, Int64, NamedTuple{(:seed, :dropout_rate), Tuple{Int64, Float32}}}}
Stacktrace:
 [1] validate(ts::Tables.Schema{(:config, :epoch, :accuracy, :weights, :architecture_version), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3")), Tuple{String, Int64, NamedTuple{(:seed, :dropout_rate), Tuple{Int64, Float32}}}}, Int64, Float32, LegolasFlux.Weights, Missing}}, sv::Legolas.SchemaVersion{Symbol("digits.model"), 1})
   @ Legolas ~/.julia/packages/Legolas/C9kX7/src/schemas.jl:276
 [2] read(io_or_path::String; validate::Bool)
   @ Legolas ~/.julia/packages/Legolas/C9kX7/src/tables.jl:169
 [3] read(io_or_path::String)
   @ Legolas ~/.julia/packages/Legolas/C9kX7/src/tables.jl:159
 [4] top-level scope
   @ REPL[4]:1
 [5] top-level scope
   @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

in https://github.com/beacon-biosignals/LegolasFlux.jl/pull/21 (line) where we test deserialization of a Legolas v0.4 serialized nested object (created/saved in https://github.com/beacon-biosignals/LegolasFlux.jl/pull/22).

I do see the column metadata in that table:

julia> ar = Arrow.Table(path)
Arrow.Table with 1 rows, 5 columns, and schema:
 :config                NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3")), Tuple{String, Int64, NamedTuple{(:seed, :dropout_rate), Tuple{Int64, Float32}}}}
 :epoch                 Int64
 :accuracy              Float32
 :weights               LegolasFlux.Weights
 :architecture_version  Missing

with metadata given by a Base.ImmutableDict{String, String} with 1 entry:
  "legolas_schema_qualified" => "digits.model@1>legolas-flux.model@1"

julia> (getfield(ar, :columns)[1]).metadata
Base.ImmutableDict{String, String} with 2 entries:
  "ARROW:extension:metadata" => ""
  "ARROW:extension:name"     => "JuliaLang.digits-config@1"
ericphanson commented 1 year ago

@kleinschmidt pointed out that earlier Legolas versions allowed "extra" columns beyond those in the schema, whereas Legloas v0.5 does not (at least, it does not allow them in the structs). Therefore there isn't a necessarily a simple fix that would work in all use-cases.

I think probably trying to handle things in the constructor like in https://github.com/beacon-biosignals/LegolasFlux.jl/pull/21 makes sense, since there you can write whatever code you want to handle extra columns.