Open fluhus opened 5 months ago
Hello, @fluhus,
This topic is interesting for me too.
I have nearly the same situation with bacterial genomes ,especially if a value of fraglen was changed from default (3000) to 1020.
ANC_3681.fasta ANC_3681.fasta 99.9992 3432 3467
for fraglen=1020
ANC_3681.fasta ANC_3681.fasta 100 1169 1177
for fraglen=3000
Hello, thanks for making this tool!
As a sanity test before I incorporate it into my pipeline I aligned a collection of viral genomes (~10K+ bases each) against themselves. To my surprise, 35% of the sequences did not have a perfect match.
For example with the attached file below, running
fastANI -q vir.fa -r vir.fa -o /dev/stdout
gave:I am seeing 100% base identity but 3 out of 4 chunks matched. Is that correct? Does that mean 100% * 3 / 4 = 75% match? How can I distinguish this case from a genome that's actually 25% shorter but matches 100%? Maybe I am misinterpreting the results?
I hope my question is clear :)
vir.fa.gz