duckdb / uc_catalog

Proof-of-concept extension combining the delta extension with Unity Catalog
MIT License
41 stars 2 forks source link

Querying Unity Catalog with DuckDB UC extension requires retrying the first SQL statement #8

Open edwardsdm opened 1 month ago

edwardsdm commented 1 month ago

The information below has been share with the Databricks UC engineering team. They suggested the issue is likely with the DuckDB UC extension.

Credential serving is enabled on the Databricks instance the code is connecting to.

Duckdb CLI installed on MacOS - Retry SHOW TABLES query

image

R - Retry SHOW TABLES query

image

Python - Retry SHOW TABLES query

Additionally even after a successful SHOW TABLES query no tables are found and the table queried successfully in R cannot be queried using Python.

image
bschulth commented 1 month ago

At least part of the problem appears to be that the extension is trying to convert column types to an internal type:

https://github.com/duckdb/uc_catalog/blob/71e18a9d1ecb65acf4f9b5f0414b214a15025dab/src/uc_utils.cpp#L32-L144

But it does not have a mapping for "varchar(xxxx)", so it throws an exception:

https://github.com/duckdb/uc_catalog/blob/71e18a9d1ecb65acf4f9b5f0414b214a15025dab/src/uc_utils.cpp#L146

Scanning our (databricks) system.information_schema.columns for unique full_data_types

select distinct data_type, full_data_type from system.information_schema.columns where data_type not in ('DECIMAL', 'STRUCT', 'ARRAY', 'MAP') order by data_type, full_data_type

It looks like you might also need to map:

Maybe also the 'unsized' versions for extra diligence?

image
samansmink commented 3 weeks ago

Hi @edwardsdm, thanks for reporting this. We will take a look at this at some point. Currently we have prioritized the development of the Delta extension over the uc_catalog extension so I can not give a timeline yet!