Open matthpeterson opened 5 years ago
The new Bulk Import for 2.0 is more granular. If you use something like map reduce to inspect the files to create the load plan before performing the bulk import, then that should greatly reduce time spent holding a table lock. You can also bulk import to a table offline.
That is perhaps true for the imports. However the merge operation could still use some work in this regard, at least in the 1.9.x codebase
That is perhaps true for the imports. However the merge operation could still use some work in this regard, at least in the 1.9.x codebase
Agreed. I don't know anything about merges so I couldn't even comment on them haha.
premise: full table locks would not be an issue as long as they are very short (sub-second). Hence if we were to solve https://issues.apache.org/jira/browse/ACCUMULO-3235, then merging would not require long locks. Basically if we keep track of the valid range within an rfile for every entry in the metadata, then we can avoid having to pre-chop the files when merging and avoid long table locks.
more discussion / related ticket here: https://github.com/apache/accumulo/issues/1050
Fate transactions like merges and imports lock a particular table. This makes it difficult to run several such operations concurrently. If the locking mechanism were more granular, perhaps over ranges of tablets, it would be possible to make progress with more operations at once and improve performance.