GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
393 stars 109 forks source link

Exception validating Oracle table with hyphen in name #1192

Open nj1973 opened 1 month ago

nj1973 commented 1 month ago

Oracle test table:

create table dvt_test."TAB-HYPHEN"
(id number(5) primary key, data varchar2(30));
insert into dvt_test."TAB-HYPHEN" values (1,'Row 1');
commit;

PostgreSQL test table:

create table dvt_test."tab-hyphen"
(id bigint primary key, data varchar(30));
insert into dvt_test."tab-hyphen" values (1,'Row 1');

Test command, PostgreSQL vs PostgreSQL:

data-validation validate column -sc=pg -tc=pg -tbls=dvt_test.tab-hyphen

Success

Test command, Oracle vs Oracle:

data-validation validate column -sc=ora -tc=ora -tbls=dvt_test.tab-hyphen

Traceback (most recent call last):
  File "/professional-services-data-validator/.venv/bin/data-validation", line 8, in <module>
    sys.exit(main())
             ^^^^^^
...
  File "/professional-services-data-validator/data_validation/validation_builder.py", line 391, in get_source_query
    table = self.config_manager.get_source_ibis_table()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
  File "/professional-services-data-validator/.venv/lib/python3.11/site-packages/sqlalchemy/engine/reflection.py", line 789, in reflect_table
    raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tab-hyphen
nj1973 commented 1 month ago

The problem comes from sqlalchemy/engine/default.py(793)denormalize_name():

        elif name_lower == name and not (
            self.identifier_preparer._requires_quotes
        )(name_lower):
            name = name_upper

The table name containing a hyphen gives us a True from self.identifier_preparer._requires_quotes() which means we do not upper case the table name before querying the Oracle dictionary.

In DVT we work with lower case names and expect the underlying engine to change case as appropriate.

nj1973 commented 1 month ago

A workaround is to supply an upper case table for the Oracle table:

data-validation validate column -sc=ora -tc=pg -tbls=dvt_test.TAB-HYPHEN=dvt_test.tab-hyphen

This validation command succeeds.

nj1973 commented 1 month ago

I've pushed an exploratory commit to this branch which changes the logic around the table name used to query the Oracle data dictionary. In SQL Alchemy code the table name is changed to upper case if it is lower case and does not contain special characters (e.g. hyphens or dollars). The presence of special characters is not really relevant, we either upper case or don't upper case. In a small number of cases switching to upper case will be wrong but that is not relevant to the change made here, we are just making the logic consistent between normal tables names and ones that contain special characters.

With the change described above we move past the sqlalchemy.exc.NoSuchTableError issue but fail when executing the validation query:

07/15/2024 10:33:28 AM-INFO: -- ** Source Query ** --
07/15/2024 10:33:29 AM-INFO: SELECT count(*) AS count
FROM dvt_test."tab-hyphen" AS t0
...
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00942: table or view does not exist

Notice the table name in the validation query has been quoted, correctly, but this is forcing a lower case name. The table is actually upper cased.

If we want to match the (rarely incorrect) denormalization logic of upper casing tables for dictionary queries then we could/should also upper case them in the validation queries when escaping them.

nj1973 commented 1 month ago

I've pushed a change for the problem described in the previous comment. It assumes that it is safe to upper case Oracle identifiers before quoting them. This is untrue if Oracle objects were created explicitly using lower or camel case.

I'm not convinced it is sensible to merge these changes though because they go against any future work to support lower or camel case Oracle objects. However I suspect they already don't work with DVT and therefore these changes are adding a bit of support (objects with special characters in their names) without really taking anything away.

More thought required.

nj1973 commented 1 month ago

I've reduced the priority because I've provided a workaround.

nj1973 commented 3 hours ago

I've created a PR for the changes discussed in previous comments but have left it in draft because I wonder if there's a simpler solution:

Change our find-table command to auto upper case Oracle table names if the name contains a hyphen. It might be a little more like a sticking plaster but doesn't involve patching SQL Alchemy code which can only simplify future upgrade work.