We've observed that the CREATE TABLE DDL alphabetizes partition column names when syncing to Glue. The values in hoodie.properties are correct; this seems to only affect the Glue table. While this doesn't impact reads from Spark, it seems that it causes issues for Trino.
To Reproduce
Steps to reproduce the behavior:
Create a Hudi table with the following code. Note that the partitioning columns are specified in c, a, b order.
Describe the problem you faced
We've observed that the
CREATE TABLE
DDL alphabetizes partition column names when syncing to Glue. The values inhoodie.properties
are correct; this seems to only affect the Glue table. While this doesn't impact reads from Spark, it seems that it causes issues for Trino.To Reproduce
Steps to reproduce the behavior:
Create a Hudi table with the following code. Note that the partitioning columns are specified in
c, a, b
order.while the table's
hoodie.properties
reportshoodie.table.partition.fields=c,a,b
Expected behavior
We expect the Glue table to preserve the partition column order.
Environment Description
The above was run on an AWS EMR cluster running version
emr-6.10.1
Hudi version :
0.12.2-amzn-0
Spark version :
3.3.1
Hive version
3.1.3
Hadoop version :
3.3.3
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : Spark on Docker
Additional context
Add any other context about the problem here.
Stacktrace
n/a