[SUPPORT] OCC locks with data on S3 and DynamoDB fails to acquire

atharvai commented 2 years ago

Describe the problem you faced

I’m running several workloads in production and one is a parallel add oof partitions to a COW hudi table. I’m managing OCC with DynamoDB and partition in dynamoo is the table name. I’m finding that each paraallel instance waits for a lock and is blocked even though partitions being updated are different. Now this compounds as the number of parallel writes/jobs increase and you see things like the screenshot where each subsequent job takes 1 minute more as it is blocked on a lock. (spark job takes about 1-2min to run and then waits on lock until previous job completes, so majority of the 1hr duration is just waiting for lock.

First question: is this designed/intended behaviour? Second question: should I be using table partition key as lock partition key? currently, as per docs we use table name only, not table partition for lock.

env: hudi v0.11 EMR 6.6.0 Spark 3.2.0

To Reproduce

Steps to reproduce the behavior:

write a spark job with OCC enabled to write data to table on S3
run multiple instances of job with different data being ingested in different partitions, could be append only to new partitions
the more concurrent jobs and more data you have the longer the locks are held by each job and newer jobs are waiting for locks

Expected behavior

lock should be held for a short time if ingestion affects unrelated partitions.

Environment Description

Hudi version :0.11.0
Spark version : 3.2.0
Hive version : 3.1.2
Hadoop version :3.2.1
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no (AWS EMR 6.6.0)

Additional context

Why would lock object be null? The default timeouts is 60s, but this seems to happen after 20min sometimes. Or sometimes after 1hr

Stacktrace

ERROR Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object null
    at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:82)
    at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:53)
    at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:230)
    at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122)
    at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:650)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
    at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)

fengjian428 commented 2 years ago

I’m finding that each paraallel instance waits for a lock and is blocked even though partitions being updated are different.

that's by design behavior. if your parallel instances are updating in different partition, Hudi still need a lock for the timeline， hence they can not change metadata at the same time

but I don't think your application should wait that long since changing metadata is a very quick operation, and Hudi won't acquire the lock before it. I suppose the lock didn't release correctly. Have you tried any other lock, such as ZkLock or Filelock?

atharvai commented 2 years ago

Hey, the design behaviour makes sense now. However we have tried ZK (decommed now) and using DDB.

Why would the exception happen with null object though? What can cause this?

HoodieLockException: Unable to acquire lock, lock object null

fengjian428 commented 2 years ago

are there any other abnormal logs？

yihua commented 2 years ago

@rahil-c @zhedoubushishi @umehrot2 is there any known issue regarding the slowness of DynamoDB-based lock provider?

@atharvai could you share the Hudi configs, especially around OCC and lock provider? How many concurrent jobs/partitions are running?

zhedoubushishi commented 2 years ago

No this is a new issue for us. As @fengjian428 mentioned, the locking happens during the commit stage, not in the data writing stage. Would be good if you can share some logs so that we can better understand on why the lock is not get released properly.

Also I am from AWS team, so if you are a AWS customer, feel free to open a ticket to us.

atharvai commented 2 years ago

Hi here are the lock configs I use:

Our processing is batch jobs triggered by AWS SFN. this can be between 2 concurrent jobs to 30 roughly. we have now gone serial execution of jobs instead of parallel until this issue is resolved.

hoodie.write.lock.dynamodb.table=hudi_locks
hoodie.write.lock.dynamodb.partition_key=Sales
hoodie.write.lock.dynamodb.region=eu-west-1
hoodie.write.lock.dynamodb.billing_mode=PAY_PER_REQUEST
hoodie.write.lock.dynamodb.endpoint_url=dynamodb.eu-west-1.amazonaws.com

Which specific logs would you like? I can try to sanitize them to share. ( have to go through company compliance so might take time)

@zhedoubushishi yes we are a AWS customer.

zhedoubushishi commented 2 years ago

Got it. Can you provide more information on how to reproduce this issue? Like what is the size of Hudi table? Are you seeing this slow down happens even with a small number of concurrent jobs? Also would be good if you can open a ticket to AWS EMR team if you have concerns to share logs publicly.

nsivabalan commented 2 years ago

you have to use the same partition key for any lock. I mean, any writes happening to a single hudi table will be acquiring one dynamo db lock, even if they are writing to two diff partitions. Only during conflict solution stages, hudi detects that two writes are for two diff partitions and will let both writes succeed. if there was a overlap, hudi will fail one of them. So, locking is required irrespective of whether writers write to same or diff partitions.

btw, our metadata table writes for a given data table happens within the same lock. So, may be thats playing a part. but you should not see so much latency in general. Did you enable access logs for dynamo DB. whats the latency you see w/ dynamoc DB. I will let @zhedoubushishi investigate this. just throwing in some ideas.

parisni commented 2 years ago

I get the same error (Unable to acquire lock, lock object null) . This occurs rarely and the workaround is to remove the entry in dynamodb manually.

nsivabalan commented 2 years ago

@atharvai : can you respond to @zhedoubushishi 's request above when you get a chance.

nsivabalan commented 1 year ago

@atharvai : do you have any updates for us. or if you got the issue resolved, let us know how did you go about resolving it. so that it could help others in the community

apache / hudi

[SUPPORT] OCC locks with data on S3 and DynamoDB fails to acquire #6226