Closed rsharris closed 4 years ago
Thanks for the suggestion. I think it shouldn't create downstream problem because hits with very low values of jaccard (e.g., 1e-5) are not considered a valid match. This anomaly results because of our poisson approximation of binomial distribution of mutations.
If the jaccard is very small but non-zero, j2md() can return a value greater than 1.0. For example, with k=10 and j=1e-5, the mash distance returned is ≈ 1.08.
I don't know if that causes any downstream problems or not, but mathematically a returned value > 1 doesn't make any sense.
The simplest solution would be to return min(1,(-1.0 / k) log(2.0 j/(1+j) )).