TobikoData / sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.
https://sqlmesh.com
Apache License 2.0
1.66k stars 150 forks source link

Table diff error not found in axis #2869

Closed achicoine-coveo closed 2 months ago

achicoine-coveo commented 3 months ago

Context: Running sqlmesh==0.109.2 The project dialect is configured as Snowflake and it runs on a duckdb gateway.

When trying to use the table diff feature, I get the error "['abc'] not found in axis". Looking at the error, it seemed strange to me that the column name is lowercase since I know a lot of database connectors will return the column names in uppercase. So I tried with a quoted identifier in lowercase and it works for that one.

Quoted lowercase works:

Screenshot 2024-07-04 at 3 43 41 PM

Quoted uppercase didn't work:

Screenshot 2024-07-04 at 3 53 23 PM

Unquoted uppercase didn't work:

Screenshot 2024-07-04 at 3 45 13 PM

Unquoted lowercase didn't work:

Screenshot 2024-07-04 at 3 48 04 PM

Code that could be relevant: https://github.com/TobikoData/sqlmesh/blob/f3cb2fa5cbc8a504514d333f5407abeba0172787/sqlmesh/core/table_diff.py#L179 the identifiers are normalized here. Attempt at dropping the index happens here: https://github.com/TobikoData/sqlmesh/blob/f3cb2fa5cbc8a504514d333f5407abeba0172787/sqlmesh/core/table_diff.py#L345

I'm not sure what's happening in the code, but my guess would be the column names are uppercase in the pandas dataframe and it's trying to drop the column name in lowercase. A lot of connectors will return the column names in uppercase in the dataframe even if the column names were lowercase in the sql.

Traceback: Traceback (most recent call last): File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/fastapi/routing.py", line 193, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/web/server/api/endpoints/table_diff.py", line 37, in get_table_diff _row_diff = diff.row_diff() File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/sqlmesh/core/table_diff.py", line 340, in row_diff self.adapter.fetchdf(column_stats_query, quote_identifiers=True) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop return super().drop( File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/Users/alexislocal/.virtualenvs/dae-cost-data-model-prototype/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop raise KeyError(f"{labels[mask].tolist()} not found in axis") KeyError: "['abc'] not found in axis"

tobymao commented 3 months ago

@Themiscodes probably a normalization issue