delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.28k stars 403 forks source link

Function behaving similarly to `SHOW PARTITIONS` in the Python API #2671

Closed FrankPortman closed 1 month ago

FrankPortman commented 3 months ago

Function behaving similarly to SHOW PARTITIONS in the Python API

I am wondering if there is something similar to SHOW PARTITIONS from the Spark world of interacting with Delta Tables. This is a metadata-only query that returns back all of the partitions live for a specific Delta Table in a tabular format. Functionality such as DeltaTable.files_by_partitions is super helpful for querying but not quite the same thing. get_active_partitions is almost what I need but (1) it's not exposed in the public API which makes usage a bit clunky (but certainly nothing life ruining) and (2) the struct it returns is not the most ergonomic.

Any openness to a PR that does this?

No related issues from what I could tell.

ion-elgreco commented 3 months ago

@FrankPortman feel free to open a PR for this, it's definitely useful as a proper public api

FrankPortman commented 3 months ago

@ion-elgreco is your preference to just open up the API so get_active_partitions exists on DeltaTable in Python? Or are you also open to it or some helper method on DeltaTable packaging the results in a more ergonomic format?

For example, right now a call to table._table.get_active_partitions() returns something like frozenset[frozenset[tuple[str, str]]], were the inner set contains as many tuples of pKey, pVal as there are partition cols in the table. My use case would involve merging those inner tables into some struct or dict, so that a single partition "row" has all the partition cols pivoted out. I don't mind handling that last part in my business logic code, but if this is something you think would be useful, I can add it here as well.

ion-elgreco commented 3 months ago

Let's go for something more ergonomic since it's a public api

omkar-foss commented 2 months ago

Seems like a nice feature to have! I can pick this up and raise a PR if no one is working on it, let me know.

FrankPortman commented 2 months ago

I'd love that - I haven't had a chance to prio yet