GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
399 stars 114 forks source link

Date fields starting with 0 (e.g. 1/16/0201) cause failed validations #1181

Closed franklinWhaite closed 2 months ago

sundar-mudupalli-work commented 3 months ago

Hi,

DVT uses the BQ FORMAT_DATE function (see table below) to convert the date to a string. When presented with a date before 1000 CE, this function does not print a two digit century number (a bug). There is a simpler CAST option that can be used to cast dates to string which works correctly. The sequence of single and double quotes below is due to bash escape sequences - to embed one single quote within a quoted string.

data-validation query -c bq  -q 'select FORMAT_DATE('"'"'%Y-%m-%d'"'"',DATE '"'"'0234-12-10'"'"')'
[('234-12-10',)]
data-validation query -c bq  -q 'select CAST(DATE '"'"'0234-12-10'"'"' as string)'
[('0234-12-10',)]
data-validation query -c bq  -q 'select CAST(DATE '"'"'0234-12-10'"'"' as string FORMAT '"'"'YYYY-MM-DD'"'"')'
[('0234-12-10',)]
I checked the Hive, SQLServer, MySQL and Teradata engines. They produced the right result. The other engines appeared to use the correct expression. Engine Expresssion Works
BigQuery FORMAT_DATE('%Y-%m-%d', t5.col_date) No
Hive CAST(t5.col_date AS string) Yes
mysql date_format(t5.col_date, '%%Y-%%m-%%d') Yes
Oracle to_char(t5.col_date, 'YYYY-MM-DD') Yes?
Postgres to_char(t5.col_date, 'YYYY-MM-DD') Yes?
Snowflake to_char(t5."COL_DATE", 'YYYY-MM-DD') Yes?
SQLServer convert(VARCHAR(32), t5.col_date, 23) Yes
Teradata TO_CHAR(t5.\"col_date\", 'YYYY-MM-DD') Yes
sundar-mudupalli-work commented 3 months ago

Hi,

This is due to a bug in GoogleSQL. See bug report. There is a simple workaround using custom calculated fields in .yaml. When a yaml file is generated for the provided test case, it has the following lines:

    field_alias: cast__date_field
    ibis_expr: ibis.expr.types.TemporalValue.strftime
    params:
    - format_str: '%Y-%m-%d'

replace them with

    field_alias: cast__date_field
    ibis_expr: ibis.expr.types.TemporalValue.cast
    params:
    - target_type: str

Then everything works correctly.