GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
404 stars 116 forks source link

validate column: Hive/PostgreSQL char(2) column is losing trailing space in column validation #842

Open nj1973 opened 1 year ago

nj1973 commented 1 year ago

Sample Hive column:

,   col_char_2      char(2)

Sample BigQuery column:

,   col_char_2      STRING(2)

COL_CHAR_2 contains A, B, C in both tables, i.e. each value has a trailing space.

Ran this validation:

data-validation validate column -sc=hive-conn -tc=bq-conn -tbls=test_user.test_table \
--filter-status=fail --max=col_char_2 --sum=col_char_2

Output:

╒═════════════════════════╤═════════════════╤╤════════════════════╤══════════════════╤══════════════════╤
│ validation_name         │ validation_type ││ source_column_name │ source_agg_value │ target_agg_value │
╞═════════════════════════╪═════════════════╪╪════════════════════╪══════════════════╪══════════════════╪
│ max__length__col_char_2 │ Column          ││ length__col_char_2 │                1 │                2 │
├─────────────────────────┼─────────────────┼┼────────────────────┼──────────────────┼──────────────────┼
│ sum__length__col_char_2 │ Column          ││ length__col_char_2 │                3 │                6 │
╘═════════════════════════╧═════════════════╧╧════════════════════╧══════════════════╧══════════════════╧

The same is true when testing PostgreSQL to BigQuery validations.

When fixing this we also need to update the integration tests, search for "issue-842" in test_hive.py, test_postgres.py and test_oracle.py.

nj1973 commented 1 year ago

Also MySQL

nj1973 commented 9 months ago

I've re-tested this on latest branch and the problem is still there.