apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.45k stars 2.43k forks source link

[HUDI-8490] Implicit lock provider use different lock key scheme #12220

Closed Davis-Zhang-Onehouse closed 2 days ago

Davis-Zhang-Onehouse commented 2 weeks ago

Change Logs

code + test

Impact

For implicit lock provider org.apache.hudi.aws.transaction.lock.DynamoDBBasedImplicitPartitionKeyLockProvider and org.apache.hudi.client.transaction.lock.ZookeeperBasedImplicitBasePathLockProvider, inside the lock key (for zookeeper) / partition key (for ddb) it contains substring <tablename>-<hash of table basepath>.

This is problematic in case of alter table rename, some writers are using the old lock key while others are using the new one, which leads to concurrency bug as the lock fails to synchronize operations from concurrent operators.

Alter rename should not have coupling with locks, so to remove the dependency, we removed the table name from the lock key for both implicit ddb and zookeeper lock provider.

Risk level (write none, low medium or high below)

Low

Documentation Update

We should update the merge into doc about this new restriction.

Contributor's checklist

hudi-bot commented 1 week ago

CI report:

Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build
nsivabalan commented 2 days ago

when was this lock provider added? if it was part fo 0.15.0, we need to add a doc update (website update) on the runbook. can you file a tracking ticket if applicable.