JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Wriring array of tuples fails #1076

Open sairus7 opened 1 year ago

sairus7 commented 1 year ago

This example works with named tuples but fails with ordinary tuples.

using HDF5

ntup = (x=5, y=6)
tup = (5, 6)
h5open("test.h5", "w") do h
    write_dataset(h, "ntup", [ntup]) # works fine
    write_dataset(h, "tup", [tup]) # errors
end

Shows error:

ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type Cstring

Closest candidates are:
  convert(::Type{Cstring}, ::Union{Ptr{Int8}, Ptr{Nothing}, Ptr{UInt8}})
   @ Base c.jl:167
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:64

Stacktrace:
  [1] cconvert(T::Type, x::Int64)
    @ Base .\essentials.jl:492
  [2] h5t_insert(dtype_id::Int64, fieldname::Int64, offset::UInt64, field_id::Int64)
    @ HDF5.API C:\Users\gvg\.julia\packages\HDF5\HtnQZ\src\api\functions.jl:7114
mkitti commented 1 year ago

Have you considered JLD or JLD2 which focus on serializing Julia types to HDF5?

How should the tuple be represented in the HDF5 file?

sairus7 commented 1 year ago

So, HDF5 doesn't support tuples by its type model.

But for this particular case (homogenous ntuple) I'd expect it to encode ntuples as a statically-sized arrays.

sairus7 commented 1 year ago

However, it throws errors both with StaticArrays and with arrays of arrays:

using StaticArrays, HDF5
vec = [5,6]
svec = SVector(5, 6)
h5open("test.h5", "w") do h
    write_dataset(h, "ntup", [ntup]) # works fine
    write_dataset(h, "vec", [vec]) # errors
    write_dataset(h, "svec", [svec]) # errors
end
mkitti commented 1 year ago

The clearest path for me would be the ntuple case. In the following example, I add a method so that HDF5.jl can figure out the correct HDF5 type that corresponds with it.

julia> import HDF5.hdf5_type_id

julia> hdf5_type_id(::Type{NTuple{N,T}}) where {N,T} = HDF5.API.h5t_array_create(hdf5_type_id(T), 1, [N])
hdf5_type_id (generic function with 17 methods)

julia> datatype(NTuple{10, Int})
HDF5.Datatype: H5T_ARRAY {
      [10] H5T_STD_I64LE
   }

julia> datatype((1,2,3))
HDF5.Datatype: H5T_ARRAY {
      [3] H5T_STD_I64LE
   }

We might be able to support SizedArrays from StaticArrays through a package extension, but we would need to figure out how to differentiate between the user wanting to write an array of elements or an element that is an array.

sairus7 commented 1 year ago

Thanks, now I can write ntuples. Is there similar way to declare target type to read back in from HDF5?

using HDF5
import HDF5.hdf5_type_id
hdf5_type_id(::Type{NTuple{N,T}}) where {N,T} = HDF5.API.h5t_array_create(hdf5_type_id(T), 1, [N])

tup = (5, 6)
h5open("test.h5", "w") do h
    write_dataset(h, "tup", [tup, tup, tup])
end

# reads back vector of vectors
d = h5open("test.h5", "r") do h
    read_dataset(h, "tup")
end
mkitti commented 1 year ago

You can do this:

julia> out = Vector{typeof(tup)}(undef, 3)
3-element Vector{Tuple{Int64, Int64}}:
 (7277816999743324160, 7205759405420183552)
 (7205759405386629120, 8358680910027030528)
 (8286623315989102592, 8358680910094139392)

julia> # reads back vector of tuples
       d = h5open("test.h5", "r") do h
           read_dataset(h["tup"], datatype(tup), out)
       end

julia> out
3-element Vector{Tuple{Int64, Int64}}:
 (5, 6)
 (5, 6)
 (5, 6)