Closed sundar-mudupalli-work closed 11 months ago
Hi,
The problem occurs because the Spanner backend behaves differently from the BigQuery backend - perhaps the Spanner backend is not fully implemented. I can work around this issue - however it might be a very simple fix in the Spanner backend. Here is a Python interpreter session demonstrating the issue. BigQuery works fine with the to_sql
function returning the sql string. When Spanner is presented with a table expression that execute()
correctly, it still fails to render with the to_sql
function.
(env) mudupalli@dvt-mudupalli-test001:~/gcsdir$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibis
>>> from data_validation.clients import get_data_client
>>> bq_c = get_data_client({"source_type": "BigQuery", "project_id": "span-cloud-ck-testing-external"})
>>> result_tbl = bq_c.table('span-cloud-ck-testing-external.pso_data_validator.results')
>>> ibis.to_sql(result_tbl)
SQLString(sql='SELECT\n t0.*\nFROM `span-cloud-ck-testing-external.pso_data_validator.results` AS t0')
>>> span_c = get_data_client({"source_type": "Spanner", "instance_id": "df-test", "database_id": "test-db"})
>>> person_tbl = span_c.table('person1')
>>> person_100 = person_tbl.filter(person_tbl.ID == 100)
>>> person_100.execute()
first_name1 ... ID
0 dshlfiuadhsfkudashgfkuadshgkaudgshkuagykfuydag... ... 100
[1 rows x 7 columns]
>>> ibis.to_sql(person_100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/ibis/expr/sql.py", line 381, in to_sql
(pretty,) = sg.transpile(sql, read=read, write=write, pretty=True)
File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 187, in transpile
for expression in parse(sql, read, error_level=error_level)
File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/__init__.py", line 74, in parse
dialect = Dialect.get_or_raise(read)()
File "/home/mudupalli/professional-services-data-validator/env/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 136, in get_or_raise
raise ValueError(f"Unknown dialect '{dialect}'")
ValueError: Unknown dialect 'spanner'
The issue seems to be that the dialect spanner
is not registered with sqlglot
. Is there a way to fix this in the backend?
Thanks.
Sundar Mudupalli
The easiest way to do this is to pass in the BQ dialect for Spanner clients in the to_sql()
call we make in partition_builder. BQ and Spanner have the same SQL syntax so this should be interchangeable.
This is a separate issue, but Spanner doesn't support ROW_NUMBER() as shown in this error when Spanner is the source system:
Command:
data-validation -v generate-table-partitions -sc spanner -tc spanner -tbls pso_data_validator.dvt_core_types -pk id -hash '*' -pn 2 -cdir /home/user/professional-services-data-validator/cdir
Error:
google.api_core.exceptions.MethodNotImplemented: 501 Unsupported built-in function: row_number.
Hi
A typical
generate-table-partitions
command fails because we are not able to convert the IBIS where clause (i.e. an IBIS table expression) into SQL text. This works for 8 other databases BigQuery, Hive, MySQL, Oracle, Postgres, Snowflake, SQL Server and Teradata (we have test cases for all those databases).The command is
Replacing
my_spanner
(which points to a Spanner database) withmy_mysql
- i.e. validate against same database succeeds.The specific error is below: