GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
411 stars 119 forks source link

find-tables is incorrect for SQL Server #1198

Open nj1973 opened 4 months ago

nj1973 commented 4 months ago

On SQL Server the code in clients.py->list_tables() is not returning correct results.

This is the current code:

def list_tables(client, schema_name):
    """Return a list of tables in the DB schema."""
    if client.name in ["db2", "mssql", "redshift", "snowflake"]:
        return client.list_tables()
    return client.list_tables(database=schema_name)

I tested this with schema_name=pso_data_validator and reviewed the list of tables:

(Pdb) p client.list_tables()
['entries']
(Pdb) p schema_name
'pso_data_validator'
(Pdb) p client.list_tables(database=schema_name)
['dvt_binary', 'dvt_core_types', 'dvt_large_decimals', 'dvt_mssql2pg_types', 'dvt_null_not_null', 'dvt_pangrams']

Note that calling client.list_tables() with no schema is giving me a list of table names in the dbo schema. But calling it with the correct schema gives the correct results.

1> SELECT table_name FROM information_schema.tables WHERE table_schema = 'dbo';
2> go
table_name
--------------------------------------------------------------------------------------------------------------------------------
entries

1> SELECT table_name FROM information_schema.tables WHERE table_schema = 'pso_data_validator';
2> go
table_name
--------------------------------------------------------------------------------------------------------------------------------
dvt_null_not_null
dvt_mssql2pg_types
tblIssue990
dvt_large_decimals
dvt_binary
dvt_pangrams
dvt_core_types

I think we need to remove "mssql" from the list of clients referenced in list_tables().

When this has been resolved we can enable the test_find_tables() test in test_sql_server.py.

See also: https://github.com/GoogleCloudPlatform/professional-services-data-validator/issues/1043

nj1973 commented 1 week ago

We have the same problem on Snowflake. We get a list of tables and then prefix with database names instead of schema names. This one might be subtly different because it might need a database.schema prefix. If that is the case then we can spin off a new issue if required.