Closed donovan-h-parks closed 5 years ago
Let me try to reproduce this to see why this is happening.. May I know how many genomes do you have in rerferences.lst
?
Probably around 50. I'm away from the office for a week, but can send you the genomes if you aren't able to reproduce the issue.
On Fri, Jul 12, 2019, 3:05 PM Chirag Jain, notifications@github.com wrote:
Let me try to reproduce this to see why this is happening.. May I know how many genomes do you have in rerferences.lst?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ParBLiSS/FastANI/issues/46?email_source=notifications&email_token=AA4EPEBH6DTMXTSCO746WSDP7D52LA5CNFSM4H7LDWBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ27TUA#issuecomment-511048144, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4EPEGZB55CGSOJAOVM2MTP7D52LANCNFSM4H7LDWBA .
Hello. Any luck in reproducing this issue? It is a bit of a concern on our end since we process large volumes of genomes and use strict cutoffs to make decisions. As such, we do run into cases where the small differences between these two modes of running FastANI lead to different results.
Hey, yes, I'm able to reproduce it... looking into it.
Hi, I've made couple of fixes for this issue.. Could you re-try FastANI with the latest code on master branch? I'll mark a new version if it works out fine.
Hello. I don't have an easy way to compile this code. The system I am on is still running gcc 4.6.3. Can you provide me with a Linux binary to test?
Here you go. fastANI.zip
I can confirm I am getting the same ANI and AF when doing a single comparison or when doing multiple comparisons via a reference list. The new result does differ slightly from both the previous values I was getting though:
97.0536 1150 1325
Thanks for the update! Yes it will differ, mainly because FastANI v1.1 was dropping high-frequency kmers (top 0.001%) in ref. DB to optimize for speed. I removed this optimization as it would also contribute to inconsistent results when comparing one vs. thousand genomes.
Hello,
Thank you for FastANI. We are using it regularly in our work. I have run into some unexpected behaviour where FastANI does not appear to give consistent results. I have a query genome Q and the reported ANI to a given reference genome R changes depending on what genomes I have in the reference list.
That is,
fastANI -q Q.fna -r R.fna -o single.tsv
Gives a different result to:
fastANI -q Q.fna --rl references.lst -o multiple.tsv
single.tsv gives: Q.fna R.fna 97.0547 1150 1325
The relevant line in multiple.tsv gives: Q.fna R.fna 97.0342 1152 1325
Why is the report ANI and number of alignment fragments different? The results change slightly as I modify the genomes in references.lst. Is this the expected behaviour? If so, it would be helpful to note this heuristic quality of FastANI in the README since these small difference do change assignments in a small number of cases when processing large genome databases which leads to confusion.