delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 364 forks source link

DataCatalog is not possible to use from the Python binding #1860

Open rtyler opened 7 months ago

rtyler commented 7 months ago

Environment

Delta-rs version: 0.13.0

Binding: Python (deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl)

Environment:


Bug

When trying to use either flavor of DataCatalog a ValueError is thrown.

What happened:

❯ python3
Python 3.11.4 (main, Jun 28 2023, 19:51:46) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from deltalake import DeltaTable, DataCatalog
>>> dt = DeltaTable.from_data_catalog(DataCatalog.AWS, 'db', 'table')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tyler/source/github/noviconnect/venv/lib64/python3.11/site-packages/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Catalog 'glue' not available.
>>> dt = DeltaTable.from_data_catalog(DataCatalog.UNITY, 'db', 'table')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tyler/source/github/noviconnect/venv/lib64/python3.11/site-packages/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Catalog 'unity' not available.
>>>

What you expected to happen:

C'mon son.

How to reproduce it:

More details:

r3stl355 commented 7 months ago

take

r3stl355 commented 7 months ago

I was unable to reproduce the glue error but I am on Mac. I'll keep on poking around but if you build for native-tls then this could be a reason (and few other places where glue feature is used alone): https://github.com/delta-io/delta-rs/blob/dd6b45362a14c0f127b32c4b81afc15d17f710d5/crates/deltalake-core/src/data_catalog/mod.rs#L141

As for the unity error, I suspect it could be a misleading error due to this, it should just return the original error as it has the right info, I'll change it: https://github.com/delta-io/delta-rs/blob/dd6b45362a14c0f127b32c4b81afc15d17f710d5/python/src/lib.rs#L136

rtyler commented 7 months ago

@r3stl355 I have a feeling that this error might still exist in main albeit with better error messages. I think the problem is the Linux wheels don't have the glue feature enabled

r3stl355 commented 7 months ago

I'll have a look, need to build myself a linux box, are you building with any specific settings or just using the standard build @rtyler ?

r3stl355 commented 7 months ago

Hey, I need to understand the problem better here. I tried this in a docker container and an Ubuntu 22.04 VM on AWS using both a build from source and a released version(deltalake-0.13.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) and I get something like that, which gives a meaningful error.

Traceback (most recent call last):
  File "/home/ubuntu/delta-rs/python/issue_1860.py", line 4, in <module>
    dt = DeltaTable.from_data_catalog(DataCatalog.AWS, 'db', 'table')
  File "/home/ubuntu/delta-rs/python/deltalake/table.py", line 287, in from_data_catalog
    table_uri = RawDeltaTable.get_table_uri_from_data_catalog(
OSError: Catalog glue error: Entity Not Found

@rtyler - what do I miss? I think that Entity not found error I am getting is coming from Glue, no?

Just confirmed, this is a Glue error from rusoto: https://github.com/delta-io/delta-rs/blob/fa6c5139033a06274dc829e0cf4053f72b0a9887/crates/deltalake-core/src/data_catalog/mod.rs#L62

roeap commented 6 months ago

reopening it since we likely want to re-add that once catalogs are working again.