JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Creating compound datasets is not working #1068

Closed tamasgal closed 1 year ago

tamasgal commented 1 year ago

I am happily using write_dataset() to dump compound structures using arrays of structs, but while I was writing an expandable cached write-out wrapper, I discovered that creating datasets is not working. Something is not propagated correctly because it complains that Type Symbol has not have a definite size.

Let me dump this issue (I'll also try to figure out what's wrong).

Probably related to https://github.com/JuliaIO/HDF5.jl/pull/1013 and see also the recent post on the Julia Discourse: https://discourse.julialang.org/t/hdf5-jl-variable-length-string/98808

julia> using HDF5  # v0.16.14

julia> struct Foo
         x::Int32
         y::Float32
       end

julia> f = h5open("test.h5", "w")
πŸ—‚οΈ HDF5.File: (read-write) test.h5

julia> d = create_dataset(f, "foo", Foo, (10,))  # the type Foo is not propagated correctly
ERROR: Type Symbol does not have a definite size.
Stacktrace:
  [1] sizeof(x::Type)
    @ Base ./essentials.jl:559
  [2] hdf5_type_id(#unused#::Type{Symbol}, isstruct::Val{true})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:71
  [3] hdf5_type_id(#unused#::Type{Symbol})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
  [4] hdf5_type_id(#unused#::Type{Core.TypeName}, isstruct::Val{true})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:74
  [5] hdf5_type_id(#unused#::Type{Core.TypeName})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
  [6] hdf5_type_id(#unused#::Type{DataType}, isstruct::Val{true})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:74
  [7] hdf5_type_id(#unused#::Type{DataType})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
  [8] datatype(#unused#::Type{Foo})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:66
  [9] create_dataset(parent::HDF5.File, path::String, dtype::Type, dspace_dims::Tuple{Int64}; pv::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/datasets.jl:103
 [10] create_dataset(parent::HDF5.File, path::String, dtype::Type, dspace_dims::Tuple{Int64})
    @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/datasets.jl:103
 [11] top-level scope
    @ REPL[7]:1

julia> foos = [Foo(1,2), Foo(3, 4)]
2-element Vector{Foo}:
 Foo(1, 2.0f0)
 Foo(3, 4.0f0)

julia> write_dataset(f, "foo", foos)

julia> f["foo"][:]
2-element Vector{NamedTuple{(:x, :y), Tuple{Int32, Float32}}}:
 (x = 1, y = 2.0)
 (x = 3, y = 4.0)

julia> d = create_dataset(f, "some_ints", Int, (10,))  # writing directly works fine
πŸ”’ HDF5.Dataset: /some_ints (file: test.h5 xfer_mode: 0)
tamasgal commented 1 year ago

OK, I found that datatype() is the culprit, see below. I am able to create the dataset by passing an instance of Foo instead of the type itself:

julia> d = create_dataset(f, "foo", foos[1])
(HDF5.Dataset: /foo (file: test.h5 xfer_mode: 0), HDF5.Datatype: H5T_COMPOUND {
      H5T_STD_I32LE "x" : 0;
      H5T_IEEE_F32LE "y" : 4;
   })

julia> datatype(Int)
HDF5.Datatype: H5T_STD_I64LE

julia> datatype(Foo)
ERROR: Type Symbol does not have a definite size.
Stacktrace:
 [1] sizeof(x::Type)
   @ Base ./essentials.jl:559
 [2] hdf5_type_id(#unused#::Type{Symbol}, isstruct::Val{true})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:71
 [3] hdf5_type_id(#unused#::Type{Symbol})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
 [4] hdf5_type_id(#unused#::Type{Core.TypeName}, isstruct::Val{true})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:74
 [5] hdf5_type_id(#unused#::Type{Core.TypeName})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
 [6] hdf5_type_id(#unused#::Type{DataType}, isstruct::Val{true})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:74
 [7] hdf5_type_id(#unused#::Type{DataType})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:69
 [8] datatype(#unused#::Type{Foo})
   @ HDF5 ~/.julia/packages/HDF5/HtnQZ/src/typeconversions.jl:66
 [9] top-level scope
   @ REPL[38]:1

julia> datatype(foos[1])  # foos is an array of Foo instances
HDF5.Datatype: H5T_COMPOUND {
      H5T_STD_I32LE "x" : 0;
      H5T_IEEE_F32LE "y" : 4;
   }

julia> HDF5.Datatype(HDF5.hdf5_type_id(Foo))
HDF5.Datatype: H5T_COMPOUND {
      H5T_STD_I32LE "x" : 0;
      H5T_IEEE_F32LE "y" : 4;
   }
tamasgal commented 1 year ago

I am scratching my head right now. We have this line

https://github.com/JuliaIO/HDF5.jl/blob/5d826687c016cd56397c66029ec24a0087463650/src/typeconversions.jl#L66

and this errors:

julia> datatype(Foo)
ERROR: Type Symbol does not have a definite size.

but when calling the right hand side manuall, it works:

julia> HDF5.Datatype(HDF5.hdf5_type_id(Foo), true)
HDF5.Datatype: H5T_COMPOUND {
      H5T_STD_I32LE "x" : 0;
      H5T_IEEE_F32LE "y" : 4;
   }

I am probably overlooking something simple πŸ˜‰

mkitti commented 1 year ago

I think we're missing a method definition.

julia> HDF5.datatype(::Type{T}) where T = HDF5.Datatype(HDF5.hdf5_type_id(T), isstructtype(T))

julia> datatype(Foo)
HDF5.Datatype: H5T_COMPOUND {
      H5T_STD_I32LE "x" : 0;
      H5T_IEEE_F32LE "y" : 4;
   }
tamasgal commented 1 year ago

Yes you're right. I'll prepare a PR with tests.