A significant number of tests set the partition column name to a column that doesn't exist in the table schema. I tried to get rid of this: https://github.com/apache/hudi/pull/12176, but this is deeply embedded in our testing, and some of the tables are actually partitioned. How I resolve this illegal table state is by setting the drop partition config in the same place where the fake partition name is added. Nearly all tests still passed after this change, but it exposed some minor bugs with how we handle drop partition cols.
Those places that were fixed are:
compaction -> either it is not dropping the column, or the commit metadata has the partition column
row writer clustering -> write configs not set for partition column names and recordkey column names, also, key generator is not deduced properly
Hive reader context -> didn't handle drop partition columns correctly
Impact
tests reflect a real situation, fixed a few bugs with drop partition config
Change Logs
A significant number of tests set the partition column name to a column that doesn't exist in the table schema. I tried to get rid of this: https://github.com/apache/hudi/pull/12176, but this is deeply embedded in our testing, and some of the tables are actually partitioned. How I resolve this illegal table state is by setting the drop partition config in the same place where the fake partition name is added. Nearly all tests still passed after this change, but it exposed some minor bugs with how we handle drop partition cols.
Those places that were fixed are:
compaction -> either it is not dropping the column, or the commit metadata has the partition column
row writer clustering -> write configs not set for partition column names and recordkey column names, also, key generator is not deduced properly
Hive reader context -> didn't handle drop partition columns correctly
Impact
tests reflect a real situation, fixed a few bugs with drop partition config
Risk level (write none, low medium or high below)
low
Documentation Update
N/A
Contributor's checklist