ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

fastANI and completeness #51

Open YiJessePi opened 5 years ago

YiJessePi commented 5 years ago

Hi, Thanks for the FAST tool! I've calculated ANI between my reconstructed metagenomes and several databases (such as NCBI). I found the genomes with (maximal) ANI between 76-83 had higher completeness relative to genomes with higher ANI (83-100) or no ANI. I couldn't find any reasonable explanation for this observation, do you have any idea?

Additionally, can you please explain why there is almost no pairs that have ANI lower than 76?

Thanks in advance!

cjain7 commented 5 years ago

Hi, FastANI results are only reliable for pairs with 80% ANI or more. For more divergent genomes, you should switch to protein sequences. From algorithmic perspective, k-mer based methods like FastANI start loosing accuracy with << 80% identity.

For divergent genomes, FastANI might be biased to report results for more complete genomes, because it has a internal threshold of minimum bi-directional hits, which as of now is an absolute cutoff (50 i think). Going forward, we will be replacing that with a fraction of genome rather than a absolute cutoff. That would resolve that bias.

Hope this helps.