apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.32k stars 2.41k forks source link

[SUPPORT] Failed to upsert for commit time #2970

Closed KarthickAN closed 3 years ago

KarthickAN commented 3 years ago

Hi, I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error.

py4j.protocol.Py4JJavaError: An error occurred while calling o318.save. : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253 at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62) at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45) at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88) at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89) at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55) ... 38 more

Below are my hudi config:::

SmallFileSize = 104857600 MaxFileSize = 125829120 RecordSize = 35 CompressionRatio = 5 InsertSplitSize = 3500000 IndexBloomNumEntries = 1500000 KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp TableType = COPY_ON_WRITE PartitionPathFields = date,sourceid HiveStylePartitioning = True WriteOperation = upsert CompressionCodec = snappy CommitsRetained = 1 CombineBeforeInsert = True PrecombineField = timestamp InsertDropDuplicates = False InsertShuffleParallelism = 100

Environment Description

Hudi version : 0.6.0

Spark version : 2.4.3

Hadoop version : 2.8.5-amzn-1

Storage (HDFS/S3/GCS..) : S3

Running on Docker? (yes/no) : No. Running on AWS Glue

n3nash commented 3 years ago

@KarthickAN Yes, like we discussed over slack, hudi 0.6.0 doesn't allow concurrent writes. To give you an idea of what's happening, Hudi timeline transitions are from requested to inflight to completed. At point in time, this transition can be performed only once. This exception is basically saying the transition has already happened and someone else is trying to do the same transition - this is mostly possible when 2 different jobs are writing to the same table with the same writeClient instance. Can you make sure that only 1 single writer it writing to the table ? If you still get the exception, that would be a bug that needs investigation.

n3nash commented 3 years ago

@KarthickAN Any updates on this one ?

KarthickAN commented 3 years ago

I made sure there are no other jobs running in parallel and I didn't face this issue. Thank you. We can close this.

deep-teliacompany commented 3 years ago

Hi, does Hudi 0.8.0 supports concurrency or from which version concurrecy is supported??

puremachinery commented 3 years ago

I'm getting this issue using hudi 0.8.0 and with no other jobs running in parallel.

nochimow commented 3 years ago

Same as @puremachinery.

matthiasdg commented 3 years ago

Should this work with 0.8.0 and jobs in parallel? Here it doesn't

Natielle commented 2 years ago

I had the same problem and I was sure that no other jobs running in parallel. The root problem was the partition column containing a value ".". To solve, I transformed "." into null value (with python was None/NaN value).

deep-teliacompany commented 2 years ago

if want to run jobs in parallel updaing update same directory then can try Hoodie locking mechanism- https://hudi.apache.org/docs/concurrency_control/

novice-gamer commented 1 year ago

I also encountered the same problem in hudi 0.6.0, what is the reason?

neeruks commented 1 year ago

I am also getting the same error. I am using Glue to read the CSV file and write it into a Hudi table.

py4j.protocol.Py4JJavaError: An error occurred while calling o326.save. : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20230809204110303

parikhishan24 commented 6 months ago

Hi Team, we are trying to do concurrent writes on HUDI hudi-spark3.3-bundle_2.12:0.13.1

and appliend concurrency control properties on hive properties via "spark.sql.extensions"="org.apache.spark.sql.hudi.HoodieSparkSessionExtension" "spark.sql.autoBroadcastJoinThreshold"="-1" "spark.serializer"="org.apache.spark.serializer.KryoSerializer" "spark.sql.catalog.spark_catalog"="org.apache.spark.sql.hudi.catalog.HoodieCatalog" "hoodie.datasource.write.keygenerator.class"="org.apache.hudi.keygen.NonpartitionedKeyGenerator" "hoodie.datasource.write.payload.class"="org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload" "hoodie.cleaner.commits.retained"="2" "hoodie.insert.shuffle.parallelism"="54" "hoodie.finalize.write.parallelism"="54" "hoodie.cleaner.fileversions.retained"="3" "hoodie.datasource.query.type"="snapshot" "hoodie.datasource.write.reconcile.schema"="true" "hoodie.write.lock.hivemetastore.database"="<db name>" "hoodie.write.lock.hivemetastore.table"="<table name>" "hoodie.write.lock.client.num_retries"="15" "hoodie.write.lock.wait_time_ms"="900000" "hoodie.write.lock.client.wait_time_ms"="900000" "hoodie.write.concurrency.mode"="optimistic_concurrency_control" "hoodie.write.lock.provider"="org.apache.hudi.hive.HiveMetastoreBasedLockProvider"

and still getting exception org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time

gs://2f907c5c3d18a85a3cf1ddc74f138e27e208fe24f33e2829af7509a031733e/data/us_secure_dl_facets_1p_restrict.db/cust_attr_store_hudi/38a71b8e-094b-423d-b8e1-8a4da7649ff4-0_661-119-157966_20240209125256640.parquet

Note : this error are intermittently, sonce we are not able to fix it and hence each time we have to drop the table.

aaliashraf commented 4 months ago

I encountered the same issue recently and spent quite some time debugging it. Eventually, I found a solution by adding the following additional options: { "hoodie.datasource.write.recordkey.field": "your_field", "hoodie.datasource.write.precombine.field": "your_field" }