Closed ctb closed 6 years ago
the BooPHF lookup returns ULLONG_MAX
if the element is not found.
The previous version of the Cython wrapper declared the lookup method as
int lookup(int)
when the correct signature is
uint64_t lookup(uint64_t)
Turns out that 18446744073709551615 == 2 ** 64 - 1 == ULLONG_MAX
,
so the previous version ended up returning -1
because it was converting an unsigned value to a signed value (using two-complement).
On the spacegraphcats code it was working before because -1
could be indexed in the mphf_to_kmer
numpy array, but 18446744073709551615
can't.
All of this can be fixed in a couple of ways:
def lookup(self, uint64_t kmer):
cdef uint64_t value = deref(self.c_mphf).lookup(kmer)
if value != ULLONG_MAX:
return value
return -1
BooPHF
and return ULLONG_MAX
(what is happening now)None
if it is not found.Personally I prefer the last option, returning None
, but the first solution fixes the problem in spacegraphcats.
(and, please, an oddity in a SPACEgraphcats project should be a space oddity, which makes me Ground Control and I guess you're Major Tom?)
I prefer 'None' too; thanks for the troubleshooting!
the
lookup
function returns a bad value on the master branch, but not in the spacegraphcats branch. Not sure what's going on. @luizirber I'll pass you the test info when I get a chance to write it up.