dib-lab / pybbhash

A Python wrapper for the bbhash library for Minimal Perfect Hashing
Other
18 stars 4 forks source link

Implement BBHashTable in Cython and optimize lookup loops. #6

Closed ctb closed 3 years ago

ctb commented 3 years ago

Implement a bbhash lookup table in Cython.

This permits queries by hash to retrieve values, with hashes that may not be in the MPHF index.

Full implementation here, optimized lookup function is here, and some simple test code is here.

Simple benchmark here.

A few thoughts on further optimizations to BBHashTable.get_unique_values:

Other thoughts --

ctb commented 3 years ago
ctb commented 3 years ago
codecov[bot] commented 3 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@9fc332b). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master       #6   +/-   ##
=========================================
  Coverage          ?   72.38%           
=========================================
  Files             ?        5           
  Lines             ?      134           
  Branches          ?        0           
=========================================
  Hits              ?       97           
  Misses            ?       37           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9fc332b...8719bcf. Read the comment docs.

ctb commented 3 years ago
ctb commented 3 years ago

I'm going to declare victory and merge this, for now. Real-world performance is sufficiently good that further updates can happen in new versions!