delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.59k stars 1.7k forks source link

Support "SHOW PARTITIONS" #996

Open zsxwing opened 2 years ago

zsxwing commented 2 years ago

It would be great to support SHOW PARTITIONS in Delta so that people can check the partition values in a table. Output example,

spark.range(1, 10).selectExpr("id as c1", "current_date() as c2", "id as c3")
  .write.mode("overwrite").format("delta").partitionBy("c1", "c2").save("/tmp/showpartitions")
spark.sql("show partitions delta.`/tmp/showpartitions`").show(false)

+---+----------+
|c1 |c2        |
+---+----------+
|1  |2022-03-10|
|2  |2022-03-10|
|4  |2022-03-10|
|3  |2022-03-10|
|5  |2022-03-10|
|6  |2022-03-10|
|9  |2022-03-10|
|7  |2022-03-10|
|8  |2022-03-10|
+---+----------+

The column names of the output should be the partition column names and each row outputs the partition values for one partition.

Syntax:

SHOW PARTITIONS delta.`<path>` PARTITION(partition_spec)
SHOW PARTITIONS <table-name>
SHOW PARTITIONS <table-name> PARTITION(partition_spec)

Note: we don't want to follow Spark's current SHOW PARTITIONS output format because it's not easy to write code to consume this format. For example, the following example puts two values in one cell.

+----------------------+
|             partition|
+----------------------+
|  state=AZ/city=Peoria|
| state=CA/city=Fremont|
|state=CA/city=San Jose|
+----------------------+
zsxwing commented 2 years ago

@Kimahriman I saw you submitted a PR previously in #699. But as I mentioned above, we don't want to use Spark's current output format. Are you still interested in working on this with a different format? In order to do this, we would need to parse the SHOW PARTITIONS command in DeltaSqlParser to execute our own SHOW PARTITIONS command.

Kimahriman commented 2 years ago

I don't really have a use case for it anymore so don't really have the time to spend on it. Mostly used it before as a way to learn some DataSourceV2 things, it's just very annoying the format that's used for the existing show partitions behavior

dennyglee commented 2 years ago

cc @JassAbidi - could you tackle this one? Thanks!

Maks-D commented 1 year ago

Hi, @zsxwing @dennyglee I had some time to fix this issue and prepared delta-io/delta/pull/1667 It introduces not only 'SHOW PARTITIONS' but also an additional feature 'DETAIL'. This is what we need in our company because getting the list of files and other partition details is not available now

Maks-D commented 1 year ago

Hi, @zsxwing @dennyglee Sorry to bother you again. Do you know who can review https://github.com/delta-io/delta/pull/1667?