apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.45k stars 2.23k forks source link

Changes in describe behaviour of a table break partition info? #10174

Open brysd opened 6 months ago

brysd commented 6 months ago

Apache Iceberg version

1.4.1

Query engine

Spark

Please describe the bug 🐞

Related to #6290 we build upon the spark DESCRIBE statement to retrieve the partition columns and enable dynamic modification of partition definitions (spec). We were on spark 3.3 and iceberg 1.3.1. However we currently are testing the migration to spark 3.4 and iceberg 1.4.1. It seems however that the describe statement of a table doesn't provide any information anymore on the partition spec if there are no partition columns defined. Before we had this as output of a DESCRIBE statement if there are no partition fields:

col_name data_type comment
id string
value1 string
value2 string
value3 string
cre timestamp
lw timestamp
x_src_cdc_ts timestamp
x_beg_ts timestamp
x_end_ts timestamp
x_current_flag int
x_cre_ts timestamp
x_upd_ts timestamp
x_ingest_method string
x_landing_filename string
x_cdc_ts timestamp
# Partitioning
Not partitioned

With spark 3.4 and iceberg 1.4.1 we get this:

col_name data_type comment
id string
value1 string
value2 string
value3 string
cre timestamp
lw timestamp
x_src_cdc_ts timestamp
x_beg_ts timestamp
x_end_ts timestamp
x_current_flag int
x_cre_ts timestamp
x_upd_ts timestamp
x_ingest_method string
x_landing_filename string
x_cdc_ts timestamp

If there are partition fields this is what we had before (spark 3.3, iceberg 1.3.1):

col_name data_type comment
id string
value1 string
value2 string
value3 string
cre timestamp
lw timestamp
x_src_cdc_ts timestamp
x_beg_ts timestamp
x_end_ts timestamp
x_current_flag int
x_cre_ts timestamp
x_upd_ts timestamp
x_ingest_method string
x_landing_filename string
x_cdc_ts timestamp
# Partitioning
Part 0 id

With spark 3.4 / iceberg 1.4.1 we get this:

col_name data_type comment
id string null
value1 string null
value2 string null
value3 string null
cre timestamp null
lw timestamp null
x_src_cdc_ts timestamp null
x_beg_ts timestamp null
x_end_ts timestamp null
x_current_flag int null
x_cre_ts timestamp null
x_upd_ts timestamp null
x_ingest_method string null
x_landing_filename string null
x_cdc_ts timestamp null
# Partition Information
# col_name data_type comment
id string null

Does this mean that when there are no partition columns defined the # Partitioning 'header' or Partition Information is not available anymore? Has anything changed with respect to the iceberg implementation of describe or is this a potential spark 3.4 issue?

nastra commented 6 months ago

@brysd this is something that changed in Spark with https://github.com/apache/spark/commit/b581b1499abc1903bb742480bb8cac3659ebe185

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.