ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

j2md can return values greater than 1 #69

Closed rsharris closed 4 years ago

rsharris commented 4 years ago

If the jaccard is very small but non-zero, j2md() can return a value greater than 1.0. For example, with k=10 and j=1e-5, the mash distance returned is ≈ 1.08.

I don't know if that causes any downstream problems or not, but mathematically a returned value > 1 doesn't make any sense.

The simplest solution would be to return min(1,(-1.0 / k) log(2.0 j/(1+j) )).

cjain7 commented 4 years ago

Thanks for the suggestion. I think it shouldn't create downstream problem because hits with very low values of jaccard (e.g., 1e-5) are not considered a valid match. This anomaly results because of our poisson approximation of binomial distribution of mutations.