apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.07k stars 446 forks source link

Implement Table Range locks for FaTE transactions #2483

Open dlmarion opened 2 years ago

dlmarion commented 2 years ago

Is your feature request related to a problem? The current table locks do not allow FaTE transactions operating on different parts of a table to run concurrently.

Describe the solution you'd like Allow for different FaTE transactions to be able to operate on a table concurrently when they affect different ranges

User feedback for this issue captured at https://github.com/apache/accumulo/pull/2467#discussion_r801868052

dlmarion commented 3 weeks ago

@keith-turner @cshannon - Is this still an issue after the recent Fate changes? Can multiple Fate transactions run concurrently on different parts of the table? I'm thinking the answer is yes with the new operation_id column in the tablet metadata in v4.0.

cshannon commented 3 weeks ago

Yeah I would say that the opid column is the table range lock as it allows locking tablets for fate operations. Some fate operations stll require locking the entire tablet (like delete table, clone, etc) but there is no getting around that. I can let @keith-turner comment too but I believe operations like Split, merge, etc that set that operation id should be able to work concurrently.

keith-turner commented 2 weeks ago

I looked into removing the zookeeper table locks and ran into two problems.

  1. Multi tablet operations like merge set operation ids on tablet independently. Removing table locks for these would require more complex locking operations on the tablets, something resembling two phase commit.
  2. Some table data is stored in zookeeper, like table state. So coordinating this type of zookeeper state w/ tablet state is much easier with the current zookeeper table locks.

The split code in accumulo 4 avoids using table locks, but that required adding complex code to coordinate with the table state stored in zookeeper.

The biggest advantage for table locks having a range would be allowing merge to run on one part of table and bulk import and compaction to continue to run on another part of the table. But now that merge operations are much faster this use case is not as compelling. Compactions and bulk imports get read locks on tables, so those can run concurrently.

Looking through all of the table operations that get write locks, found the following.

Merge seems to be the only operation that would benefit from a range on the table lock. Create,delete, clone, etc would all lock the full range of the table if locks had a range. Since merge is faster now and does not need the ranges, maybe there is no good use case for this feature.

There is a caveat though. Even though merge is faster, once a merge is initiated it will wait for running bulk imports and table compactions to complete. While its waiting it will prevent new ones from starting. So it could cause a disturbance in throughput.