MeltanoLabs / target-snowflake

Singer Target for the Snowflake cloud Data Warehouse
https://hub.meltano.com/loaders/target-snowflake--meltanolabs/
Other
10 stars 24 forks source link

Load timestamp-data as timestamp_tz #224

Open larsrinn opened 3 months ago

larsrinn commented 3 months ago

I have a tap which generates data with DateTimeType. Unfortunately, when loaded into Snowflake, the timezone information vanishes and data is stored as timestamp_ntz. Am I doing something wrong? I would like to have it as timestamp_tz

edgarrmondragon commented 3 months ago

I think that's hardcoded here:

https://github.com/MeltanoLabs/target-snowflake/blob/b79dbd3c0d8dcc1a6d2d39ec94d04be4b0002077/target_snowflake/connector.py#L284

Perhaps we could add a new target setting to switch to mapping from JSON date-time strings to TIMESTAMP_TZ.

PRs welcome!

larsrinn commented 3 months ago

How would that work? Wouldn't it be required for the tap's schema to claim a given column has datetime-values with timezone information? I didn't find such an option in the Singer-SDK.

edgarrmondragon commented 3 months ago

How would that work? Wouldn't it be required for the tap's schema to claim a given column has datetime-values with timezone information? I didn't find such an option in the Singer-SDK.

In short, what I mean is we could add a user-level setting, e.g. use_timestamp_tz to decide if the target uses TIMESTAMP_NTZ (the default) or TIMESTAMP_TZ.


The JSON Schema spec, which the target uses to determine column types, expects rfc3339 date-times^1 but by default there's enforcement of that specific format, so expecting a full rfc3339 string by default might break some users' pipelines.

larsrinn commented 3 months ago

So what you're saying is:

edgarrmondragon commented 3 months ago

All correct. In short, I worry that timestamp_tz might cast datetime strings differently and might not accept something like 2020-01-01T00:00:00 (i.e. without a tz component), which users might be currently sending to the target.

I worry that timestamp_tz might cast datetime strings differently and might not accept something like 2020-01-01T00:00:00

Is that something you could confirm? Essentially I want to validate how snowflake handles a few cases:

-- date
'2020-01-01'::TIMESTAMP_NTZ
'2020-01-01'::TIMESTAMP_TZ

-- date-time without tz
'2020-01-01T00:00:00'::TIMESTAMP_NTZ
'2020-01-01T00:00:00'::TIMESTAMP_TZ

-- date-time with offset
'2020-01-01T00:00:00+00:00'::TIMESTAMP_NTZ
'2020-01-01T00:00:00+00:00'::TIMESTAMP_TZ

-- date-time with Z
'2020-01-01T00:00:00+00:00Z'::TIMESTAMP_NTZ
'2020-01-01T00:00:00+00:00Z'::TIMESTAMP_TZ