Closed wernst closed 1 year ago
One other bottleneck for larger result sets was the prefix removal step in TupleDatabaseClient
. Removing that processing on scans without a prefix actually takes that 4ms avg down to .08ms (in benchmarks with the same params as above)
Very nice! That's a great idea. Would mind adding those test cases to this PR?
Sure, the cases in those benchmarks were based on the scan API and your tests and the test suite continues to pass after that update. To clarify, would you like me to add those cases to your benchmarks file, or are there some new test cases you'd like to see?
Yeah, the benchmarks and the "complex" examples that demonstrate the usefulness of this slice approach.
Added benchmarks for three different types of tuple reads: (1) tuple of numbers [1,2,3]
, (2) tuple of objects [{val: 1}, {val: 2}]
, and (3) tuple of arrays [[1,2,3], [4,5,6]]
. It's tough to compare benchmarks from commit to commit, but I saw improved benchmark times with this commit and a much more consistent read speed across the different tuple types.
Releasing Version 2.2.0 right now with this change.
Hi I've had this in a patch and thought I'd make a PR.
I found that index scans on more complex keys (nested objects) and that return larger datasets were a minor bottleneck. It looks like the scan algorithm finds a start index with binary search and linearly searches for the end from there. In cases with complex keys returning a larger number of rows, the potentially
n
comparisons add up.This PR searches for both the start and end index with binary search and properly slices the data based on args. Aside from less comparisons, a small win may also be that it could avoid array resizing under the hood.
Here are some benchmarks I wrote locally (all returning ~10000 tuples, where
simple
keys are just an array of value types andcomplex
keys are arrays of objects):Current
With PR
In those benchmarks you'll see average scan times of
complex
keys go from ~34ms to ~4ms.