INCEPTdk / omop_etl

3 stars 2 forks source link

Use time zones for timestamps #112

Closed epiben closed 2 months ago

epiben commented 4 months ago

Some source data timestamps are in local time, while others are in UTC. The PR reconciles this by correctly localising the timestamps and, then, convert them all to Europe/Copenhagen time. The _datetime fields remain timezone-unaware as the timezone carries no useful information in the CDM, as they have all been reconciled at this point. Also, using TIMESTAMPTZ types instead of TIMESTAMP could create owkward situations if data were analysed in, say, UTC time because e.g. start_date could be 2014-04-03 and start_datetime 2014-04-03 23:00:00 when there are no actual timestamps, and they had been hardcoded to midnight as per OMOP conventions.

Part of this work is also making start_datetime and end_datetime values more recent as the current practice in Denmark has only been around since 1980 (more or less: https://www.borger.dk/miljoe-og-energi/Energi/Sommertid). Using test timestamp from the 1800's was wreaking all kinds of havoc, making pytz resort to something called local mean time, in which Copenhagen e.g. is 53 minutes ahead of GMT.

Nota bene:

daplaci commented 3 months ago

@epiben is this still WIP?

epiben commented 3 months ago

Yes