ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

How to interpret n mapped fragments? #43

Open SilasK opened 5 years ago

SilasK commented 5 years ago

Hello

I have a question about the fastANI output E.g. Genome1 genome2 0.9 60 100

0.9 is the estimated ANI over the whole genome or only over the aligned fragments?

How can we interpret the ratio of mapped /all fragments? Does 60/ 100 mean the genomes overlap to 60 %?

IbI have e.g. 5 mapped from 100 can I trust the AI calculation ?

I work with MAGs may be I need to be more cautious. Thanks for the clarifictions

cjain7 commented 5 years ago

ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.

You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.

limin321 commented 4 years ago

ANI is computed over the aligned (or conserved) fraction of genomes. That's how it's been defined in the early papers.

You're right, 60 out of 100 fragments in the query genome (Genome 2) have been mapped to Genome 1. FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments.

Sorry I have a very basic question on understanding how fastANI works. "FastANI has an internal threshold of minimum 50 fragments to avoid incorrect ANI estimation from just a few matching fragments." And by default --fragLen=3000, does this mean only when there are at least 50 fragments whose length >= 3000 bp, the ANI will be considered reliable?

cjain7 commented 4 years ago

For earlier versions of FastANI, what you said is true. Hope you are using the latest available FastANI version now. Since version v1.3 or later (see https://github.com/ParBLiSS/FastANI/releases) , we have revised this criteria. With the new version, the help page fastani -h would show you a --minFraction parameter which tells that a minimum percentage sequence of two genomes must be shared b/w them for the ANI score to be considered reliable.

ZhangDengwei commented 4 years ago

Hi, I have a question about how to rationally consider both the identity and coverage, when assigning my assembled genome to the reference database? Say, the examples are as follows:

genome1   genome2   0.9   60   100
genome1   genome3   0.8   90   100

which one is closer to my query genome1?

cjain7 commented 4 years ago

As far as I'm aware, candidates are typically ranked by just identity.

chen1i6c04 commented 4 years ago

Could I know the ratio of mapped /all fragments in reference genome? I hope exclude genome much smaller than reference genome.

cjain7 commented 4 years ago

You can perhaps exchange values given to -q and -r.