Closed nj1973 closed 11 months ago
I found that column names in an Ibis table object are lower cased, in sqla_table
in the line below from ibis/backends/base/sql/alchemy/__init__.py
519 sqla_table = self._get_sqla_table(name, database=database, schema=schema)
I could not get to where exactly they are being lower cased.
I've made custom query column names also lower case via a change in DVT's _metadata
method override of the Oracle client. I'm not particularly happy with it (feels grubby) but I'm not sure where to look for a better solution. Any thoughts @nehanene15 ?
I wonder if this has to do with us building configs? For column/row validations we use casefold() to lowercase column names so the user can provide case insensitive names. Here is an example for column aggs/PKs.
I wonder if the config generated for regular validation and custom query is different since custom query uses the SQLQueryResult instead of the SQLAlchemyTable.
I'm also curious if other DB clients have this issue with custom query?
I can get the same exception if I pass upper case column names to SQL Server:
data-validation -v validate custom-query row -sc sqlserver -tc bq \
--source-query="select ID,COL_INT8,COL_INT16,COL_INT32 from pso_data_validator.dvt_core_types" \
--target-query="select id,col_int8,col_int16,col_int32 from pso_data_validator.dvt_core_types" \
--primary-keys=id --comparison-fields="col_int16,col_int32"
...
File "/home/user/professional-services-data-validator/env/lib/python3.10/site-packages/ibis/expr/operations/generic.py", line 39, in __init__
raise com.IbisTypeError(
ibis.common.exceptions.IbisTypeError: Column 'col_int16' is not found in table. Existing columns: 'ID', 'COL_INT8', 'COL_INT16', 'COL_INT32'.
I guess the difference is that Oracle upper cases columns by default and most other systems lower case them. I didn't expect SQL Server to honour the case I used in the source query.
It seems this is a wider problem than just Oracle so, for now, I'll back out my previous change.
Note to self.
When validating an Oracle table, not custom query, we pass through
professional-services-data-validator/env/lib/python3.10/site-packages/sqlalchemy/dialects/oracle/base.py
@reflection.cache
def get_columns(self, connection, table_name, schema=None, **kw):
...
c = connection.execute(sql.text(text), params)
for row in c:
colname = self.normalize_name(row[0])
orig_colname = row[0]
Notice the use of self.normalize_name(row[0])
. row[0]
matches the data dictionary and is upper case by default in Oracle. self.normalize_name(row[0])
converts it to lower case.
The code to normalize_name()
is here: sqlalchemy/engine/default.py(769)normalize_name()
I think we need to build normalize_name()
into the flow for custom-query somewhere.
This example came up on an Oracle to PostgreSQL validation. Default identifier case in Oracle is uppercase. Default identifier case in PostgreSQL is lowercase.
Oracle test table:
PostgreSQL test table:
Table level row validation
This works fine and I can see from the debugger that the
table
objects inquery_builder.compile()
are:Notice the upper case Oracle columns have been converted to lower case.
Custom query row validation
I can see from the debugger that the
table
object inquery_builder.pt.compile()
is:The columns remain upper case.
If I change
--comparison-fields
to upper case we then fail on the PostgreSQL side.The workaround is to force the Oracle query to result in a lower case column.
I think we are inconsistent somewhere in the code. For table validations column names are lower cased, for custom query validations they are not.