apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[HUDI-8456] add drop partition config if we are using fake partition #12178

Open jonvex opened 3 weeks ago

jonvex commented 3 weeks ago

Change Logs

A significant number of tests set the partition column name to a column that doesn't exist in the table schema. I tried to get rid of this: https://github.com/apache/hudi/pull/12176, but this is deeply embedded in our testing, and some of the tables are actually partitioned. How I resolve this illegal table state is by setting the drop partition config in the same place where the fake partition name is added. Nearly all tests still passed after this change, but it exposed some minor bugs with how we handle drop partition cols.

Those places that were fixed are:

compaction -> either it is not dropping the column, or the commit metadata has the partition column

row writer clustering -> write configs not set for partition column names and recordkey column names, also, key generator is not deduced properly

Hive reader context -> didn't handle drop partition columns correctly

Impact

tests reflect a real situation, fixed a few bugs with drop partition config

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

hudi-bot commented 3 weeks ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build