JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
547 stars 85 forks source link

Draft: explicit datasets #568

Closed JonasIsensee closed 1 month ago

JonasIsensee commented 1 month ago

This implements a macro for a declaritive description of header messages and makes the library use them. One nice side effect is nearly free pretty printing of header messages :

Demo header message declaration:

@pseudostruct HM_DATASPACE begin
    version::UInt8 = 2
    dimensionality::UInt8 = length(kw.dimensions)
    flags::UInt8
    (version == 2) && dataspace_type::UInt8
    if version == 1
        dataspace_type::@computed(DS_V1)
        @skip(5)
    end
    dim_offset::@Offset
    dimensions::NTuple{Int(dimensionality), Int64}
    isset(flags,0) && max_dimension_size::NTuple{Int(dimensionality), Int64}
end

It supports

Demo Header message printing

``` julia> f JLDFile /tmp/jl_Y3lEf5/test.jld (read/write) β”œβ”€πŸ“‚ test_group_1 β”‚ β”œβ”€πŸ”’ x1 β”‚ β””β”€πŸ”’ x2 β”œβ”€πŸ“‚ test_group_2 β”‚ β””β”€πŸ”’ x1 β”œβ”€πŸ“‚ test_group_3 β”‚ β”œβ”€πŸ”’ x1 β”‚ β””β”€πŸ“‚ contained_group (1 entry) β”œβ”€πŸ“‚ test_group_4 (2 entries) └─ β‹― (4 more entries) julia> JLD2.print_header_messages(f, f.root_group_offset) β”Œβ”€ Header Message: HM_LINK_INFO β”‚ β”Œβ”€ offset: RelOffset(1824) β”‚ β”‚ size: 18 β”‚ └─ flags: 0 β”‚ version: 0 β”‚ flags: 0 β”‚ fractal_heap_address: UNDEFINED_ADDRESS └─ v2_btree_name_index: UNDEFINED_ADDRESS β”Œβ”€ Header Message: HM_GROUP_INFO β”‚ β”Œβ”€ offset: RelOffset(1846) β”‚ β”‚ size: 2 β”‚ └─ flags: 0 β”‚ version: 0 └─ flags: 0 β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1852) β”‚ β”‚ size: 24 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 12 β”‚ link_name: test_group_1 └─ target: RelOffset(447) β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1880) β”‚ β”‚ size: 24 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 12 β”‚ link_name: test_group_2 └─ target: RelOffset(590) β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1908) β”‚ β”‚ size: 24 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 12 β”‚ link_name: test_group_3 └─ target: RelOffset(888) β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1936) β”‚ β”‚ size: 24 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 12 β”‚ link_name: test_group_4 └─ target: RelOffset(1193) β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1964) β”‚ β”‚ size: 23 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 11 β”‚ link_name: empty_group └─ target: RelOffset(1349) β”Œβ”€ Header Message: HM_LINK_MESSAGE β”‚ β”Œβ”€ offset: RelOffset(1991) β”‚ β”‚ size: 38 β”‚ └─ flags: 0 β”‚ version: 1 β”‚ flags: 16 β”‚ link_name_charset: 1 β”‚ link_name_len: 26 β”‚ link_name: group_with_one_group_child └─ target: RelOffset(1659) β”Œβ”€ Header Message: HM_NIL β”‚ β”Œβ”€ offset: RelOffset(2033) β”‚ β”‚ size: 16 β”‚ └─ flags: 0 └─ ```

Dataset pretty printing

This can print the written structures and also the encoded julia type name without needing to reconstruct them. This should help when normal loading does not work.

julia> jldsave("test.jld2"; d=rand(ComplexF64, 50,100))

julia> jldopen("test.jld2") do f
           display(JLD2.get_dataset(f, "d"))
       end
β”Œβ”€ Dataset: "d" at RelOffset(4551)
β”‚  datatype: JLD2.SharedDatatype (committed)
β”‚       committed at: RelOffset(4312)
β”‚       written structure: @NamedTuple{re::Float64, im::Float64}
β”‚       type name: Base.Complex{Float64}
β”‚  dataspace:
β”‚       type: Simple
β”‚       dimensions: (100, 50)
β”‚  layout:
β”‚       class: LcContiguous
└─

julia> struct MyStruct{T}
           a::Int
           b::T
       end

julia> struct B; x::String; end

julia> jldopen("test.jld2", "w") do f
           dset = JLD2.create_dataset(f, "d")
           JLD2.add_attribute(dset, :description, "A description of what this is for.")
           JLD2.write_dataset(dset, MyStruct(1, B("TestString")))
       end
RelOffset(4632)

julia> jldopen("test.jld2", "r") do f
           display(JLD2.get_dataset(f, "d"))
       end
β”Œβ”€ Dataset: "d" at RelOffset(4632)
β”‚  datatype: JLD2.SharedDatatype (committed)
β”‚       committed at: RelOffset(4427)
β”‚       written structure: @NamedTuple{a::Int64, b::@NamedTuple{x::String}}
β”‚       type name: MyStruct{B}
β”‚               a::Int64
β”‚               b::B
β”‚  dataspace:
β”‚       type: Scalar
β”‚       dimensions: ()
β”‚  layout:
β”‚       class: LcCompact
β”‚  Attributes:
β”‚       description = "A description of what this is for."
└─

Summary of new (experimental) API

ToDo

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 72.72727% with 303 lines in your changes missing coverage. Please review.

Project coverage is 84.87%. Comparing base (ae8a06a) to head (db1556b). Report is 4 commits behind head on master.

Files Patch % Lines
src/committed_datatype_introspection.jl 0.00% 103 Missing :warning:
src/explicit_datasets.jl 63.86% 86 Missing :warning:
src/object_headers.jl 69.92% 40 Missing :warning:
src/macros_utils.jl 89.26% 22 Missing :warning:
src/misc.jl 54.54% 15 Missing :warning:
src/types.jl 60.00% 14 Missing :warning:
src/headermessages.jl 25.00% 6 Missing :warning:
src/datalayouts.jl 93.15% 5 Missing :warning:
src/datatypes.jl 90.32% 3 Missing :warning:
src/fractal_heaps.jl 75.00% 3 Missing :warning:
... and 3 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #568 +/- ## ========================================== - Coverage 87.00% 84.87% -2.14% ========================================== Files 31 36 +5 Lines 4318 4370 +52 ========================================== - Hits 3757 3709 -48 - Misses 561 661 +100 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.