ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

Disable HDF5 timestamps #674

Closed ianhinder closed 4 years ago

ianhinder commented 4 years ago

By default, HDF5 objects are stored with an internal creation timestamp. This means that if the same file is written twice, the content, and hence the checksum, will change. This can be disabled: (http://docs.h5py.org/en/stable/high/group.html)

create_dataset(...,track_times=False)

It's possible there are other timestamps stored, not associated with datasets, in which case I can't immediately find anything about them on the h5py website.

I don't think the dataset creation time is useful for us, and omitting it means we can tell that a repeated run produced the same result (via the checksum).

github-actions[bot] commented 4 years ago

Heads up @mrow84 @bobturneruk - the "data pipeline api" label was applied to this issue.

ianhinder commented 4 years ago

Implemented by @mrow84 in https://github.com/ScottishCovidResponse/data_pipeline_api/pull/83.