dib-lab / pybbhash

A Python wrapper for the bbhash library for Minimal Perfect Hashing
Other
18 stars 4 forks source link

Oddity in value returned by `lookup` on master branch #2

Closed ctb closed 6 years ago

ctb commented 6 years ago

the lookup function returns a bad value on the master branch, but not in the spacegraphcats branch. Not sure what's going on. @luizirber I'll pass you the test info when I get a chance to write it up.

(18446744073709551615, <class 'int'>, <class 'numpy.ndarray'>)
luizirber commented 6 years ago

the BooPHF lookup returns ULLONG_MAX if the element is not found. The previous version of the Cython wrapper declared the lookup method as

int lookup(int)

when the correct signature is

uint64_t lookup(uint64_t)

Turns out that 18446744073709551615 == 2 ** 64 - 1 == ULLONG_MAX, so the previous version ended up returning -1 because it was converting an unsigned value to a signed value (using two-complement).

On the spacegraphcats code it was working before because -1 could be indexed in the mphf_to_kmer numpy array, but 18446744073709551615 can't.

All of this can be fixed in a couple of ways:

Personally I prefer the last option, returning None, but the first solution fixes the problem in spacegraphcats.

(and, please, an oddity in a SPACEgraphcats project should be a space oddity, which makes me Ground Control and I guess you're Major Tom?)

ctb commented 6 years ago

I prefer 'None' too; thanks for the troubleshooting!