apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.07k stars 446 forks source link

Improve locking granularity #1324

Open matthpeterson opened 5 years ago

matthpeterson commented 5 years ago

Fate transactions like merges and imports lock a particular table. This makes it difficult to run several such operations concurrently. If the locking mechanism were more granular, perhaps over ranges of tablets, it would be possible to make progress with more operations at once and improve performance.

milleruntime commented 5 years ago

The new Bulk Import for 2.0 is more granular. If you use something like map reduce to inspect the files to create the load plan before performing the bulk import, then that should greatly reduce time spent holding a table lock. You can also bulk import to a table offline.

ivakegg commented 5 years ago

That is perhaps true for the imports. However the merge operation could still use some work in this regard, at least in the 1.9.x codebase

milleruntime commented 5 years ago

That is perhaps true for the imports. However the merge operation could still use some work in this regard, at least in the 1.9.x codebase

Agreed. I don't know anything about merges so I couldn't even comment on them haha.

ivakegg commented 5 years ago

premise: full table locks would not be an issue as long as they are very short (sub-second). Hence if we were to solve https://issues.apache.org/jira/browse/ACCUMULO-3235, then merging would not require long locks. Basically if we keep track of the valid range within an rfile for every entry in the metadata, then we can avoid having to pre-chop the files when merging and avoid long table locks.

ivakegg commented 5 years ago

more discussion / related ticket here: https://github.com/apache/accumulo/issues/1050