HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

Handling timestamps in .h5 files with rhdf5 #241

Open glenncampagna opened 2 years ago

glenncampagna commented 2 years ago
glenncampagna commented 2 years ago
rburghol commented 2 years ago

Ok so the discovery of bit64conversion is exciting -- great work! Now if we can just figure out why we have to divide by a billion.

nicoledarling commented 2 years ago

@rburghol I believe since the Unix timestamps are in nanoseconds and the normal dates are in seconds, we need to divide by a billion to convert. More on this in link below in answer 5: https://discuss.dizzycoding.com/convert-numpy-datetime64-to-string-object-in-python

juliabruneau commented 2 years ago

@rburghol in addition to what @nicoledarling found, this converter website gives the definition of the Unix epoch timeseries that the .h5 files use:

The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z). Literally speaking the epoch is Unix time 0 (midnight 1/1/1970), but 'epoch' is often used as a synonym for Unix time. Some systems store epoch dates as a signed 32-bit integer, which might cause problems on January 19, 2038 (known as the Year 2038 problem or Y2038). The converter on this page converts timestamps in seconds (10-digit), milliseconds (13-digit) and microseconds (16-digit) to readable dates.

Since they state that "some systems store the epoch dates as signed 32-bit integer", this might be the reason why we had to convert the data first with bit64conversion?

From: https://www.epochconverter.com/

rburghol commented 2 years ago

@nicoledarling ahh this is a good catch. I would clarify though that unix tjmestamps are, i think in seconds, and it is numpy timestamps that are in nanoseconds. numpy is a library in python, and perhaps there is some reason that it is storing these data as nanoseconds. Maybe the way our ches bay UCI files are formatted? I think we should look at the output of the test UCIs/h5 files that come with the hsp2 package.
Download and run these thru hsp2: