ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

Different results based on genomes in reference list #57

Closed aaronmussig closed 4 years ago

aaronmussig commented 4 years ago

Hello,

I was testing fastANI 1.2 and came across the same issue presented in #46 where list comparisons give different results to individual comparisons. I've included the genomes used to generate the below results here: fastani.zip.

The query list option gives a slightly different ANI and AF for U_77353_genomic and GCA_002499525 in the A vs. B direction compared to running it individually. This was tested against a fresh download of the 1.2 binary. This is the minimum subset of reference genomes which results in a different ANI/AF. Thanks for your work!

Running --ql vs --rl:

List A List B A vs B B vs A
U_77353_genomic GCA_002499525 99.2949 216/259 99.1235 221/288
GCF_002787055 79.0849 149/259 79.2057 144/565
GCF_000723185 76.748 52/259 76.4391 60/539

Running -q vs -r:

Genome A Genome B A vs B B vs A
U_77353_genomic GCA_002499525 99.1834 217/259 99.1235 221/288
U_77353_genomic GCF_002787055 79.0849 149/259 79.2057 144/565
U_77353_genomic GCF_000723185 76.748 52/259 76.4391 60/539
cjain7 commented 4 years ago

I thought this was fixed in #46, but may be not. I'll check this and get back to you.

cjain7 commented 4 years ago

Hi @aaronmussig

Sorting of one vector in my code was not deterministic, i've fixed that bug now. I was able to reproduce the above results, now they appear correct. Thanks for sharing detailed output and files!

If you want to test at your end, please use the latest code from master branch.

aaronmussig commented 4 years ago

Hi @cjain7 thanks for looking into this, it's much appreciated! It looks fine on my end now.