Closed Limess closed 22 hours ago
We should probably add some management ourselves to limit the partitions, is there any advice/pre-canned example of limiting to say, 450 past partitions and deleting any older using spark?
we have drop partition cmd support, does that make sense to you?
I threw together a spark script to drop all but the n
latest partitions for our use case longer term.
I still think this is a bug and the DROP PARTITIONS command to AWS Glue should be robust enough to not send too many partitions at once.
The cmd writes an empty data frame using spark writer: https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/AlterHoodieTableDropPartitionCommand.scala, and that would trigger a batch sync of partitions with the https://github.com/apache/hudi/blob/master/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java
That seems to be far exceeding MAX_DELETE_PARTITIONS_PER_REQUEST = 25;
if that's the case.
@Limess But it will batch it and fire Multiple API calls to delete all partitions. AWS had a limit of 25 partitions for delete api, that's the reason we are having this.
Can you try once, it should ideally work.
As shown in the stacktrace above, this failed in production rather than trying in batches - it seemed to try to delete all 505 partitions at once:
at 'expression' failed to satisfy constraint: Member must have length less than or equal to 2048 (Service: AWSGlue; Status Code: 400; Error Code: ValidationException; Request ID: 3819d43b-dbd2-4a02-addb-cb2d99814001; Proxy: null))
Yeah, this feature is introduced since 0.15.0: https://issues.apache.org/jira/browse/HUDI-7466
Thanks - in that case I'll close this issue.
Describe the problem you faced
Glue sync fails with
INSERT_OVERWRITE_TABLE
when previous partitions are not included in new load.In our case, we have a couple of years worth of data, but only want to load the last
n
days, overwriting the old table. Deleting the old, now defunct partitions fails.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Partitions are removed from AWS Glue catalog correctly.
Environment Description
Hudi version : 0.14.1-amzn-0
Spark version : 3.5.0
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
EMR 7.1.0
Stacktrace
full stacktrace
Truncated (too long for github):
py4j.protocol.Py4JJavaError: An error occurred while calling o1350.save. : org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81) at org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:1014) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriterInternal.metaSync(HoodieSparkSqlWriter.scala:1012) at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1111) at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:520) at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:204) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:121) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:113) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:129) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:165) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:165) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:276) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:164) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:70) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:503) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:503) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:33) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:479) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:101) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:88) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:151) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:240) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing articles_hudi_copy_on_write at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:171) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79) ... 55 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table articles_hudi_copy_on_write at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:444) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:291) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:180) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:168) ... 56 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL ALTER TABLE
articles
.articles_hudi_copy_on_write
DROP IF EXISTS PARTITION (story_published_partition_date
='2021-12-15'), PARTITION (story_published_partition_date
='2021-12-11'), PARTITION (story_published_partition_date
='2022-04-29'), PARTITION (story_published_partition_date
='2021-12-24'), PARTITION (story_published_partition_date
='2021-12-23'), PARTITION (story_published_partition_date
='2021-12-22'), PARTITION (story_published_partition_date
='2021-12-21'), PARTITION (story_published_partition_date
='2022-04-30'), PARTITION (story_published_partition_date
='2022-06-02'), PARTITION (story_published_partition_date
='2022-06-01'), PARTITION (story_published_partition_date
='2022-06-04'), PARTITION (story_published_partition_date
='2022-06-03'), PARTITION (story_published_partition_date
='2022-06-06'), PARTITION (story_published_partition_date
='2022-06-05'), PARTITION (story_published_partition_date
='2022-06-08'), PARTITION (story_published_partition_date
='2022-06-07'), PARTITION (story_published_partition_date
='2022-06-24'), PARTITION (story_published_partition_date
='2022-06-23'), PARTITION (story_published_partition_date
='2022-06-26'), PARTITION (story_published_partition_date
='2022-06-25'), PARTITION (story_published_partition_date
='2022-06-28'), PARTITION (story_published_partition_date
='2022-06-27'), PARTITION (story_published_partition_date
='2022-06-29'), PARTITION (story_published_partition_date
='2021-02-10'), PARTITION (story_published_partition_date
='2022-06-20'), PARTITION (story_published_partition_date
='2022-06-22'), PARTITION (story_published_partition_date
='2021-02-15'), PARTITION (story_published_partition_date
='2022-06-21'), PARTITION (story_published_partition_date
='2022-06-09'), PARTITION (story_published_partition_date
='2022-06-13'), PARTITION (story_published_partition_date
='2022-06-12'), PARTITION (story_published_partition_date
='2022-06-15'), PARTITION (story_published_partition_date
='2022-06-14'), PARTITION (story_published_partition_date
='2022-06-17'), PARTITION (story_published_partition_date
='2022-06-16'), PARTITION (story_published_partition_date
='2022-06-19'), PARTITION (story_published_partition_date
='2022-06-18'), PARTITION (story_published_partition_date
='2021-02-21'), PARTITION (story_published_partition_date
='2021-02-23'), PARTITION (story_published_partition_date
='2021-02-22'), PARTITION (story_published_partition_date
='2021-02-25'), PARTITION (story_published_partition_date
='2021-02-24'), PARTITION (story_published_partition_date
='2022-06-11'), PARTITION (story_published_partition_date
='2021-02-26'), PARTITION (story_published_partition_date
='2022-06-10'), PARTITION (story_published_partition_date
='2021-12-28'), PARTITION (story_published_partition_date
='2021-12-27'), PARTITION (story_published_partition_date
='2023-01-02'), PARTITION (story_published_partition_date
='2023-01-01'), PARTITION (story_published_partition_date
='2021-12-31'), PARTITION (story_published_partition_date
='2021-12-30'), PARTITION (story_published_partition_date
='2023-01-08'), PARTITION (story_published_partition_date
='2023-01-07'), PARTITION (story_published_partition_date
='2023-01-09'), PARTITION (story_published_partition_date
='2023-01-04'), PARTITION (story_published_partition_date
='2023-01-03'), PARTITION (story_published_partition_date
='2023-01-06'), PARTITION (story_published_partition_date
='2023-01-05'), PARTITION (story_published_partition_date
='2023-01-11'), PARTITION (story_published_partition_date
='2023-01-10'), PARTITION (story_published_partition_date
='2023-01-13'), PARTITION (story_published_partition_date
='2023-01-12'), PARTITION (story_published_partition_date
='2022-06-30'), PARTITION (story_published_partition_date
='2022-08-04'), PARTITION (story_published_partition_date
='2022-08-03'), PARTITION (story_published_partition_date
='2022-08-06'), PARTITION (story_published_partition_date
='2022-08-05'), PARTITION (story_published_partition_date
='2022-08-08'), PARTITION (story_published_partition_date
='2022-08-07'), PARTITION (story_published_partition_date
='2022-08-09'), PARTITION (story_published_partition_date
='2022-08-02'), PARTITION (story_published_partition_date
='2022-08-01'), PARTITION (story_published_partition_date
='2021-10-01'), PARTITION (story_published_partition_date
='2021-10-02'), PARTITION (story_published_partition_date
='2021-09-17'), PARTITION (story_published_partition_date
='2022-08-26'), PARTITION (story_published_partition_date
='2022-08-25'), PARTITION (story_published_partition_date
='2022-08-28'), PARTITION (story_published_partition_date
='2022-08-27'), PARTITION (story_published_partition_date
='2022-08-29'), PARTITION (story_published_partition_date
='2022-08-20'), PARTITION (story_published_partition_date
='2022-08-22'), PARTITION (story_published_partition_date
='2022-08-21'), PARTITION (story_published_partition_date
='2022-08-24'), PARTITION (story_published_partition_date
='2022-08-23'), PARTITION (story_published_partition_date
='2022-08-15'), PARTITION (story_published_partition_date
='2022-08-14'), PARTITION (story_published_partition_date
='2022-08-17'), PARTITION (story_published_partition_date
='2022-08-16'), PARTITION (story_published_partition_date
='2022-08-19'), PARTITION ...`