Closed ctb closed 3 years ago
:exclamation: No coverage uploaded for pull request base (
master@9fc332b
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #6 +/- ##
=========================================
Coverage ? 72.38%
=========================================
Files ? 5
Lines ? 134
Branches ? 0
=========================================
Hits ? 97
Misses ? 37
Partials ? 0
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9fc332b...8719bcf. Read the comment docs.
I'm going to declare victory and merge this, for now. Real-world performance is sufficiently good that further updates can happen in new versions!
Implement a bbhash lookup table in Cython.
This permits queries by hash to retrieve values, with hashes that may not be in the MPHF index.
Full implementation here, optimized lookup function is here, and some simple test code is here.
Simple benchmark here.
A few thoughts on further optimizations to
BBHashTable.get_unique_values
:hashes
argument is of unknown length, so can't easily be required to be a numpy array. I think the current approach works pretty well if it is a numpy array, although it probably copies it unnecessarily.self.mphf.lookup
could be done entirely in C/Cython land quite easily, I think. Just need to produce headers that letbbhash_table.pyx
know details aboutbbhash.pyx
.Other thoughts --