Upsert taking too long to finish

SamarthRaval commented 1 year ago

Hello Guys, all my deltacommits are being written < 1hr but so much time is being wasted in deleting marker directory[shown in screenshot], but never got proper understanding why exactly it is happening ?

My configuration are as below:

hoodie.datasource.hive_sync.database -> prod_hudi_tier2, hoodie.datasource.hive_sync.mode -> hms, hoodie.datasource.hive_sync.support_timestamp -> true, path -> s3://transactions.all_hudi, hoodie.datasource.write.precombine.field -> lastmodifieddate, hoodie.datasource.hive_sync.partition_fields -> warehouse,year,month, hoodie.datasource.write.payload.class -> com.NullSafeDefaultHoodieRecordPayload, hoodie.datasource.hive_sync.skip_ro_suffix -> true, hoodie.metadata.enable -> true, hoodie.datasource.hive_sync.table -> transactions_all, hoodie.datasource.meta_sync.condition.sync -> true, hoodie.clean.automatic -> false, hoodie.datasource.write.operation -> upsert, hoodie.datasource.hive_sync.enable -> true, hoodie.datasource.write.recordkey.field -> uuid, hoodie.table.name -> transactions_all, hoodie.datasource.write.table.type -> MERGE_ON_READ, hoodie.datasource.write.hive_style_partitioning -> true, hoodie.datasource.write.reconcile.schema -> true, hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.upsert.shuffle.parallelism -> 5760, hoodie.meta.sync.client.tool.class -> org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool, hoodie.datasource.write.partitionpath.field -> warehouse,year,month, hoodie.compact.inline.max.delta.commits -> 25

I am also storing in AWS glue if that is creating problem, no idea ? Or may be metadata is taking so much time ? This is slowing down entire pipeline.

I have put all the detail screenshot and information in slack message.

Please let me know if you still need information.

Slack Message

SamarthRaval commented 1 year ago

@yihua Could you please guide me here

parisni commented 1 year ago

Consider using the timeline server. Its designed to faster marker management

SamarthRaval commented 1 year ago

@parisni Do you mean using this ? https://hudi.apache.org/docs/next/configurations#hoodiewritemarkerstype

Default is hoodiewritemarkerstype: TIMELINE_SERVER_BASED

Or should I specify explicitly ?

Using hudi 0.12.1

parisni commented 1 year ago

You are right this is likely the default. You can make sure by looking into the marker directory while writing process. .hoodie/.temp/commit

When timeline server used then few large files are appended. However one file per new parquet file is creates. In case of very large commit with many written files there is an overhead creating/dropping them.

Could you share a screenshot of your spark ui after job conpletion ?

On June 9, 2023 8:00:32 PM UTC, Samarth Raval @.***> wrote:

@parisni Do you mean using this ? https://hudi.apache.org/docs/next/configurations#hoodiewritemarkerstype

Default is hoodiewritemarkerstype: TIMELINE_SERVER_BASED

Or should I specify explicitly ?

Using hudi 0.12.1

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1585067556 You are receiving this because you were mentioned.

Message ID: @.***>

SamarthRaval commented 1 year ago

And the page after that

SamarthRaval commented 1 year ago

inside /.hoodie/.temp

20230515131506776 20230602131939547

s3://.hoodie/.temp/20230602131939547/

parisni commented 1 year ago

Then I confirm u use timeline server. Also from your stats I am not surprised writing 30k partition and updating so much files takes 8min (the doing partition and writing files job). The tagging and building profile also looks correct).

What is weird is the first web ui view. I wonder why you have those listings happening before starting the upsert. Since you have MDT enabled this cannot be listing partition for getting table file. Could you also share the first 40 jobs to understand what's going on with // file listing ?

On June 9, 2023 9:12:58 PM UTC, Samarth Raval @.***> wrote:

Also I have large number of partitions, last few commits look like this ~Approximate number of partition could be (20001012 ~24,000 - 30,000) [ Assuming this as I don't know how to count number of partitions]

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1585134643 You are receiving this because you were mentioned.

Message ID: @.***>

parisni commented 1 year ago

Sum of time from stage 0 to 57 takes 40 min. So 2 hours is spend after stage 57 has finished, can you confirm ? I' am not familiar with MOR table, so I'm not sure what's going on after committing. Likely not cleaning or compaction since it would show up a dedicated stage.

Have you looked at the executor logs to see if something happens there ?

On June 10, 2023 4:42:03 PM UTC, Samarth Raval @.***> wrote:

I have disabled the metadata [I was suspecting it is making my emr job super slow, but I was wrong, as its still take same amount of time]

This is first screen shot; 1st page:

2nd page screen shot: https://github.com/apache/hudi/issues/8925#issuecomment-1585103310

3rd page screen shot: https://github.com/apache/hudi/issues/8925#issuecomment-1585102673

Entire execution took around 2.5 hours:

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1585731130 You are receiving this because you were mentioned.

Message ID: @.***>

SamarthRaval commented 1 year ago

After stage 40 is over, EMR job just goes on idle state doing nothing for more then 1 hour, that is very weird and I cannot figure out what it is doing during that time.

Stag: 0 - 40 [Good performance] After that just idle for > 1 hr -> this is the problem Stag: 41 - 57 [Finishes off after that]

Compaction I am running async, cleaning never ran it [disabled it, could it be the reason not running cleaning for a long time could create a problem ?]

I can attach here some emr stats showing for some time it just went idle.

Core nodes stats:

Task nodes stats:

In both the above image you can see the drop where it just sits idle and core-task nodes both goes to zero and sits there, after some time it will come back again and stag: 41 - 57 executes and job finishes.

You can also see the I/O stats of emr cluster

parisni commented 1 year ago

Something happens during 1h37 between stage 43 and 44. It's not stage 40 as you previously said IMG_20230611_003703

parisni commented 1 year ago

disabled it, could it be the reason not running cleaning for a long time could create a problem

Yeah definitely a good things to investigate. Turn on auto cleaning with trigger each 1 commit w/o MDT and see if it improve after few upsert batch

SamarthRaval commented 1 year ago

Something happens during 1h37 between stage 43 and 44. It's not stage 40 as you previously said

Yes, you are correct, that is so weird and that makes the job slow, do we know why it happens, I have put the emr stats as well and I see the core-task nodes goes down as it does nothing, weird!!!

Do we know how to fix this idle state ?

SamarthRaval commented 1 year ago

@parisni

I tried cleaning stuff but in between stages it was idle for more then hour, where EMR job was idle and doing nothing, do you ever seen that before or someone has something like this before ?

Between stages 45 & 46 it was idle for almost ~1.2 hours

parisni commented 1 year ago

Did the cleaning eventually finished ? I already had such issue of slow cleaning and in appearance it does noting but spark is being dealing with s3 fs which mainly uses networking.

On June 11, 2023 2:53:08 AM UTC, Samarth Raval @.***> wrote:

@parisni

I tried cleaning stuff but in between stages it was idle for more then hour, where EMR job was idle and doing nothing, do you ever seen that before or someone has something like this before ?

Between stages 45 & 46 it was idle for almost ~1.2 hours

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1585982752 You are receiving this because you were mentioned.

Message ID: @.***>

SamarthRaval commented 1 year ago

Did the cleaning eventually finished ? I already had such issue of slow cleaning and in appearance it does noting but spark is being dealing with s3 fs which mainly uses networking. … On June 11, 2023 2:53:08 AM UTC, Samarth Raval @.> wrote: @parisni I tried cleaning stuff but in between stages it was idle for more then hour, where EMR job was idle and doing nothing, do you ever seen that before or someone has something like this before ? Between stages 45 & 46 it was idle for almost ~1.2 hours -- Reply to this email directly or view it on GitHub: #8925 (comment) You are receiving this because you were mentioned. Message ID: @.>

I think so cleaning did actually finish but the in-between stages 45 & 46 has significant delay(as mentioned above), if we know whats the problem then may be EMR job can finish <1.5 hrs [which would be best, and could help for other tables as well]

parisni commented 1 year ago

How many files/log files do you have in the partitions ?

On June 11, 2023 6:40:48 PM UTC, Samarth Raval @.***> wrote:

Did the cleaning eventually finished ? I already had such issue of slow cleaning and in appearance it does noting but spark is being dealing with s3 fs which mainly uses networking. … On June 11, 2023 2:53:08 AM UTC, Samarth Raval @.> wrote: @parisni I tried cleaning stuff but in between stages it was idle for more then hour, where EMR job was idle and doing nothing, do you ever seen that before or someone has something like this before ? Between stages 45 & 46 it was idle for almost ~1.2 hours -- Reply to this email directly or view it on GitHub: #8925 (comment) You are receiving this because you were mentioned. Message ID: @.>

I think so cleaning did actually finish but the in between stage 45 & 46 is the significant delay, if we know whats the problem there may be EMR job can finish <1.5 hrs [which would be best]

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1586277627 You are receiving this because you were mentioned.

Message ID: @.***>

SamarthRaval commented 1 year ago

How many files/log files do you have in the partitions ? … On June 11, 2023 6:40:48 PM UTC, Samarth Raval @.> wrote: > Did the cleaning eventually finished ? I already had such issue of slow cleaning and in appearance it does noting but spark is being dealing with s3 fs which mainly uses networking. > … > On June 11, 2023 2:53:08 AM UTC, Samarth Raval @.> wrote: @parisni I tried cleaning stuff but in between stages it was idle for more then hour, where EMR job was idle and doing nothing, do you ever seen that before or someone has something like this before ? Between stages 45 & 46 it was idle for almost ~1.2 hours -- Reply to this email directly or view it on GitHub: [#8925 (comment)](#8925 (comment)) You are receiving this because you were mentioned. Message ID: @.> I think so cleaning did actually finish but the in between stage 45 & 46 is the significant delay, if we know whats the problem there may be EMR job can finish <1.5 hrs [which would be best] -- Reply to this email directly or view it on GitHub: #8925 (comment) You are receiving this because you were mentioned. Message ID: @.>

Sorry, I am not really sure how can I give you exact numbers of files/log files/partitions ?

Do you know how can I calculate those ?

SamarthRaval commented 1 year ago

@parisni could you please guide me here ?

ad1happy2go commented 1 year ago

@SamarthRaval You can use aws s3 ls api to get the number of files and partitions.

SamarthRaval commented 1 year ago

@ad1happy2go @parisni

Number of partitions: 149,541

Number of files: 1,498,353

Size of the table: ~ 10-15 TB

parisni commented 1 year ago

Thanks. What's the distribution of files in partition ? For example how much files in the largest partition

On June 12, 2023 5:13:57 PM UTC, Rocksss @.***> wrote:

@ad1happy2go @parisni

Number of partitions: 149,541

Number of files: 1,498,353

Size of the table: ~ 10-15 TB

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1587739785 You are receiving this because you were mentioned.

Message ID: @.***>

yihua commented 1 year ago

Hey @SamarthRaval , based on the stacktrace and Spark UI screenshots you provided, it looks like that the time-taking part is in the meta sync / table refresh in Spark, which does not use the metadata table for file listing, even if the metadata is present in the Hudi table. Could you try adding this config and see if it improves the latency of the stage: hoodie.meta.sync.metadata_file_listing = true?

SamarthRaval commented 1 year ago

@parisni

Number of files very so much inside each partitions,

largest partition has ~30,000 files.

see many partitions having ~5,000 to ~15,000 files.

ad1happy2go commented 1 year ago

Update - Had a discussion with @SamarthRaval . After cleaner the number of files decreased and he will try out upsert again along with fix Ethan suggested.

ad1happy2go commented 1 year ago

@SamarthRaval Gentle ping on this. Feel free to close if you able to resolve it.

SamarthRaval commented 1 year ago

@ad1happy2go Still didn't get a chance to test this, will update soon here.

SamarthRaval commented 1 year ago

Hello guys,

I got the chance to experiment with latest hudi 0.13.1 and enabled all metadata related config to enhance the performance.

"hoodie.metadata.enable" "hoodie.meta.sync.metadata_file_listing"

but still seeing the slow down, and spark server goes to idle state for more then an hour.

You can see the idle time in between stages which is weird and causing performance bottleneck.

@ad1happy2go @yihua @parisni

SamarthRaval commented 1 year ago

Slow down with detail spark UI

noahtaite commented 1 year ago

Hey @ad1happy2go @yihua any chance you guys can let us know what may be happening between this time (stages 56 and 57 in Sam's screenshot above)? We see 1h+ being lost here after the deltacommit file has been written. Very confused as to what may be happening here, our best assumption is marker file deletion in S3 but those are only a few thousand objects and maybe MBs in size... we don't think this should take 1hr+ in the pipeline. tysm for the help

ad1happy2go commented 1 year ago

@noahtaite @SamarthRaval Can you please get the driver logs when it getting hold

noahtaite commented 1 year ago

@ad1happy2go

Driver logs during delay time attached. One interesting thing to note is that the driver logs are unavailable/restarted shortly before the delay. We are also using EMR managed scaling and notice that the cluster goes down to 1 master, 5 core, 0 task nodes during this time (from a maximum of 40 task nodes during the job).

https://gist.github.com/noahtaite/e0309969c05ea3a825ed41a3f2065e21

noahtaite commented 1 year ago

@ad1happy2go

We ran a job with the same input data but disabled the Hive sync to AWS Glue functionality and this performance bottleneck / missing SHS stage was not observed. The job completed successfully in just 50mins (acceptable performance).

Please advise if there is a way to optimize AWS Glue sync. We noticed one flag was missing in our pipeline, "hoodie.datasource.hive_sync.use_jdbc" = "true" even though "hoodie.datasource.hive_sync.mode" = "hms". We are attempting another test with the use_jdbc flag set to false.

SamarthRaval commented 1 year ago

@ad1happy2go @parisni @bhasudha @yihua @nsivabalan Could guys please help here.

parisni commented 1 year ago

If hive sync is slow maybe try hoodie.datasource.hive_sync.filter_pushdown_enabled

On August 10, 2023 8:09:07 PM UTC, Sam @.***> wrote:

@ad1happy2go @parisni @bhasudha @yihua @nsivabalan Could guys please help here.

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1673842511 You are receiving this because you were mentioned.

Message ID: @.***>

parisni commented 1 year ago

If hive sync is slow maybe try hoodie.datasource.hive_sync.filter_pushdown_enabled

On August 10, 2023 8:09:07 PM UTC, Sam @.***> wrote:

@ad1happy2go @parisni @bhasudha @yihua @nsivabalan Could guys please help here.

-- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1673842511 You are receiving this because you were mentioned.

Message ID: @.***>

SamarthRaval commented 1 year ago

Hello @parisni As you suggested I tried above config but started getting below error, which I never seen before

23/08/11 21:46:38 ERROR Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:984) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:381) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:165) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59) ... 56 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:429) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:280) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:188) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:162) ... 57 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Partition fields and values should be same length, but got partitionFields: [] with values: [partition1, year1, month1] at org.apache.hudi.hive.util.PartitionFilterGenerator.lambda$generatePushDownFilter$5(PartitionFilterGenerator.java:187) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.hive.util.PartitionFilterGenerator.generatePushDownFilter(PartitionFilterGenerator.java:192) at org.apache.hudi.hive.HiveSyncTool.getTablePartitions(HiveSyncTool.java:381) at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:423) ... 60 more

parisni commented 1 year ago

Likely this one only works w/ glue sync, not hive sync. BTW you could try using the new glue sync instead. Its more optimized for gku than HMS/jdbc api

On August 12, 2023 3:15:28 AM UTC, Sam @.***> wrote:

Hello @parisni As you suggested I tried above config but started getting below error, which I never seen before

23/08/11 21:46:38 ERROR Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:984) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:381) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760) Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:165) at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59) ... 56 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:429) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:280) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:188) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:162) ... 57 more Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Partition fields and values should be same length, but got partitionFields: [] with values: [partition1, year1, month1] at org.apache.hudi.hive.util.PartitionFilterGenerator.lambda$generatePushDownFilter$5(PartitionFilterGenerator.java:187) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) at org.apache.hudi.hive.util.PartitionFilterGenerator.generatePushDownFilter(PartitionFilterGenerator.java:192) at org.apache.hudi.hive.HiveSyncTool.getTablePartitions(HiveSyncTool.java:381) at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:423) ... 60 more -- Reply to this email directly or view it on GitHub: https://github.com/apache/hudi/issues/8925#issuecomment-1675650288 You are receiving this because you were mentioned. Message ID: ***@***.***>

apache / hudi

Upsert taking too long to finish #8925