ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

Error in number of mapped fragments #105

Closed eric-kernfeld-dzd closed 2 years ago

eric-kernfeld-dzd commented 2 years ago

This is a quick interpretation question, which may reveal problems with the output depending on what you intend to estimate. In your paper describing fastANI, you write "At the end of the Mashmap run, all the query fragments are mapped to [the reference] B. The results are saved in a set M containing triplets of the form〈f, i, p〉, where f is the fragment id, i is the identity estimate, and p is the starting position where f is mapped to B."

In the example output, you show only 1303 out of 1608 mapped, not all. Similar results can be found by running the tool on simple test sequences. So:

Thank you!

cjain7 commented 2 years ago

We look for reciprocal best matches computed through forward and reverse searches. In the above example, there are 1303 reciprocal (bi-directional) best matches. Rationale for this bi-directional approach is to bound the ANI computation to orthologous genes and discard the paralogs.

1608 should be the count of query fragments (|A|/l in terms of notations in the paper). A subset of these contribute to bi-directional best matches.