ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

Values chosen in the --matrix output ? #36

Closed nigiord closed 5 years ago

nigiord commented 5 years ago

Hi there,

While doing an all-to-all ANI comparison on a set of genomes, I noticed that the regular output displays different values when the genomes are switched:

1509405_PRJNA252589.fasta.gz  246200_PRJNA281.fasta.gz      76.3461  98    1679
246200_PRJNA281.fasta.gz      1509405_PRJNA252589.fasta.gz  76.9103  84    1369

When using the --matrix option there is only a single value for this pair, which is 76.628181 (looks like the mean).

I thus have two questions:

Cheers, Nils

vinisalazar commented 5 years ago

Following 👁

cjain7 commented 5 years ago

@nigiord , the basic pipeline that we follow to estimate ANI lacks symmetry, (e.g., if you use BLAST-based ANI, the same issue occurs there). This is mainly due to the heuristics that are being followed (see the Methods section of the FastANI paper for more details.) That said, we expect the difference two be almost negligible if you change the order of two genomes. You are right, we are taking mean for the --matrix option as I could display only one value here.

nigiord commented 5 years ago

Make sense, thank you for your answer!