GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
407 stars 119 forks source link

validate column: Character columns are skipped for min/max validations #758

Open nj1973 opened 1 year ago

nj1973 commented 1 year ago

Character columns are skipped for min/max validations, for example:

data-validation -v validate column -sc ora_conn -tc pg_connl -tbls myschema.customers \
-min='first_name' -max='first_name'
...
03/02/2023 04:38:33 PM-INFO: Skipping min on first_name due to data type: string[non-nullable]
03/02/2023 04:38:33 PM-INFO: Skipping max on first_name due to data type: string[non-nullable]

I believe this means the only aggregated validation we can do is count. I don't think this is adequate for proving a migration.

Presumably string columns are skipped because of potential problems around localization and multi-byte characters.

Would it be worth having an option stating we trust the source/target data and want to validate the specified string columns?

nehanene15 commented 1 year ago

We do have a method to include string length aggregations which is the flag --wildcard-include-string-len or -wisl (Some documentation on it here

I think you may have found a small bug here where we added 'string' to the list of supported data types for aggregate validation but forgot to also add 'string[non-nullable]'.

viclinriv commented 3 months ago

Can I be assigned this issue?