Closed henrihem closed 2 weeks ago
Hi @henrihem!
Could you please share with us the DVT commands or configs you're using for both scenarios? It would really help us to assess the possibilities
I did some search about Bug 1: maybe if you add a new DVT connection with this additional parameter "connect_args": {"timezone": "UTC"}}
, or either temporarily change your current connection JSON file just for the test (located in <USERNAME>/.config/google-pso-data-validator/<CONNECTION_NAME>.connection.json
), you can be able to force the Snowflake connection to read values as UTC, not sure if that would work, but we could give it a shot.
It would look like this:
{"source_type": "Snowflake", "secret_manager_type": null, "secret_manager_project_id": null, "user": "XXXXX", "password": "XXXXX", "account": "XXXXXX", "database": "XXXXX", "connect_args": {"timezone": "UTC"}}
Here are the commands:
I added the connect_args, but the situation seems the same. I think it is because that field is in fact TIMESTAMP_NTZ in Snowflake, meaning session timezone will not make a difference.
As said: dt TIMESTAMP_NTZ(9) in Snowflake dt DATETIME in BigQuery
In the below example, "dt" and "co" columns are concatenated, where "co" is just some string field.
data-validation validate row \
-sc sf_conn \
-tc bq_conn \
-tbls pub.TBL=pub.TBL \
--concat dt,co \
--primary-keys pk \
-rr \
-rbs 100
Same command as in previous step.
In this case desc is STRING field in Snowflake and BigQuery. This string field can contain scandic characters, such as ä and ö.
data-validation validate row \
-sc sf_conn \
-tc bq_conn \
-tbls pub.TBL1=pub.TBL1 \
--concat desc,co \
--primary-keys pk \
-rr \
-rbs 100
@henrihem, I believe I have a fix for the time zone issue, thanks for pointing this problem out.
You can review the proposed change on this pull request: https://github.com/GoogleCloudPlatform/professional-services-data-validator/pull/1174
There are a few small bugs, which have been reported by our users.
Bug 1 / needs clarification how to handle
This case is tricky, since project default_time_zone has been set to Europe/Helsinki. That might be the core cause of this issue.
Table schema: dt TIMESTAMP_NTZ(9) in Snowflake dt DATETIME in BigQuery
When running concat comparison, the "dt" is casted as:
In BigQuery
In Snowflake.
This creates different value, since in BigQuery, the "dt" is casted as UTC, thus resulting in different value than in Snowflake, due to default_time_zone being other than UTC.
Bug 2
Scandic characters are not handled properly from Snowflake. There is a description field in Snowflake, which can contain text, for example "Testiä".
In comparisons these result in different strings: Snowflake:
BigQuery: