JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
383 stars 139 forks source link

Encoding/Decoding vector of Datetimes #1082

Open cpaniaguam opened 1 year ago

cpaniaguam commented 1 year ago

I am writing a vector of Datetimes to an h5 file. Then I read it back and I get the following.

julia> dts
3-element Vector{DateTime}:
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37

julia> retrieved_dts["dts"]
3-element Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}:
 (instant = (periods = (value = 63787347529000,),),)
 (instant = (periods = (value = 63787357689000,),),)
 (instant = (periods = (value = 63787430617000,),),)

How is one supposed to interpret these values to recover the human readable Datetime? I thought they were unix times but they are exceedingly large for contemporary times!

julia> t = retrieved_dts["dts"][1].instant.periods.value;

julia> Dates.unix2datetime(t)
2023310-07-30T06:56:40

Maybe this is something one is not supposed to do? I guess I could just do the conversion to unix time before writing to h5, but it'd be nice not have to do this every time!

mkitti commented 1 year ago

Could you provide example code of how you are writing the vector to a HDF5 file?

mkitti commented 1 year ago

Here's an example recreating your issue, and perhaps also solving it via reinterpret(DateTime, ...)

julia> using HDF5, Dates

julia> dts = DateTime.([
        "2022-05-04T11:38:49",
        "2022-05-04T14:28:09",
        "2022-05-05T10:43:37"
       ])
3-element Vector{DateTime}:
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37

julia> h5open("test.h5", "w") do h5f
           ds = write_dataset(h5f, "dts", dts)
       end

julia> dts = h5open("test.h5", "r") do h5f
           h5f["dts"][:]
       end
3-element Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}:
 (instant = (periods = (value = 63787347529000,),),)
 (instant = (periods = (value = 63787357689000,),),)
 (instant = (periods = (value = 63787430617000,),),)

julia> reinterpret(DateTime, dts)
3-element reinterpret(DateTime, ::Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}):
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37

julia> dts = reinterpret(DateTime, dts)
3-element reinterpret(DateTime, ::Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}):
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37

julia> using HDF5, Dates

julia> h5open("test.h5", "w") do h5f
           ds = write_dataset(h5f, "dts", dts)
       end

julia> retrieved_dts = h5open("test.h5", "r") do h5f
           h5f["dts"][:]
       end
3-element Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}:
 (instant = (periods = (value = 63787347529000,),),)
 (instant = (periods = (value = 63787357689000,),),)
 (instant = (periods = (value = 63787430617000,),),)

julia> retrieved_dts = reinterpret(DateTime, retrieved_dts)
3-element reinterpret(DateTime, ::Vector{NamedTuple{(:instant,), Tuple{NamedTuple{(:periods,), Tuple{NamedTuple{(:value,), Tuple{Int64}}}}}}}):
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37

julia> retrieved_dts[1]
2022-05-04T11:38:49
mkitti commented 1 year ago

You could also retrieve the DateTime directly via

julia> h5open("test.h5", "r") do h5f
           read(h5f["dts"], DateTime)
       end
3-element Vector{DateTime}:
 2022-05-04T11:38:49
 2022-05-04T14:28:09
 2022-05-05T10:43:37
cpaniaguam commented 1 year ago

@mkitti you beat me to the punch in including a workflow, thanks! Thank you also for introducing me to reinterpret and using read with the DateTime type!

But what if the end user of the h5 file is not going to necessarily use Julia to read it? Maybe encoding the datetimes to something more generic could be useful?

mkitti commented 1 year ago

For a broadly interpretable date time, I would consider the ISO8601 date format strings.

https://en.wikipedia.org/wiki/ISO_8601?wprov=sfla1

If you would like to understand how Julia's Datetime object works perhaps the Julia Discourse forum would be a good place to figure out. It seems to be milliseconds in UT time.