bluenote-1577 / skani

Fast, robust ANI and aligned fraction for (metagenomic) genomes and contigs.
MIT License
159 stars 10 forks source link

Thomas wang's hash function #35

Closed jianshu93 closed 1 month ago

jianshu93 commented 1 month ago

Hello Jim,

I noticed that the integer hash function is not from Heng Li's C implementation minimap2, but from this website (https://aebou.rbind.io/post/a-rust-glimpse-at-thomas-wang-integer-hash-function/), which is just copy the code from probminhash crate (https://github.com/jean-pierreBoth/probminhash/blob/master/src/invhash.rs), the first Rust implementation I know (also as the author of the blog mentioned). The code is exactly the same but just change from key_arg to kmer in the input string. Did you develop the Thomas Wang's hash function in Rust independently or the other way, since Rust, the wrap add to avoid over flow is very different from that of C. I think the idea was originally by Heng Li, but the actual rust implementation should also be acknowledged since ProbminHash was under the two licenses:

Licensed under either of

Apache License, Version 2.0, LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0 MIT license LICENSE-MIT or http://opensource.org/licenses/MIT

Thanks,

Jianshu

bluenote-1577 commented 1 month ago

Hi jianshu,

I looked at hengs implementation saw the blog post a while back. I used hengs as a reference.

There's actually a bug in my implementation of the hash function as pointed out to me by Donovan parks in a previous issue #20 .. so I did not copy anybodys code verbatim.

If you feel strongly I can add a reference but I did not know about the existence of the probminhash crate