GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
395 stars 112 forks source link

row validation: SQL Server --hash on text column throws exception #834

Open nj1973 opened 1 year ago

nj1973 commented 1 year ago

Testing row validation with --hash on a text column throws an exception as below:

sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Argument data type text is invalid for argument 1 of rtrim functio...

Example column:

,   col_string      text
helensilva14 commented 11 months ago

Hi @nj1973! I was reviewing this issue and it seems kinda related to the #990, in the sense that we need to convert TEXT, NTEXT, IMAGE columns to VARCHAR (related docs: https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver16).

RTRIM documentation says the arg "_must be of a data type that is implicitly convertible to varchar. Otherwise, use CAST to explicitly convert characterexpression".

Also wanted to review if we keep this issue as p1, what do you think?

nj1973 commented 11 months ago

It is related to the divergence of behaviour for nullable vs not-null columns mentioned in [990]. In the case of TEXT and NTEXT I think we need to divert the cast to force_cast, both functions are defined in ibis_addon/api.py. The difficulty is figuring out how and when. As far as I can tell there's no way of knowing the SQL Server data type to be able to know when a String column is actually text. It is also not clear how we would switch to force_cast. More research needed.