ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

Missing results in output file #22

Open donovan-h-parks opened 6 years ago

donovan-h-parks commented 6 years ago

Hello,

It seems that when a reference genome is divergent enough from the query genome, the resulting comparison may not be reported in the output file. I appreciate that a minimum number of fragments are required for a reliable ANI estimation. However, it is often very confusing when a comparison is simply missing from the output file. Perhaps it would be better to report these as "N/A" instead of just leaving out the comparison completely.

There also appears to be an actual bug since the following may result in no comparison being reported: fastANI --minFrag -1 -q input.fna -r reference.fna -o test.tsv

Both the input and reference genomes here are "normal" genomes with plenty of >10 kb contigs. This is particularly problematic when using a list of reference genomes and then having to manually establish which comparisons weren't performed.

Thanks for any assistance you can provide.

cjain7 commented 6 years ago

Hi

I agree with you; I will edit the code soon to put that feature in place. The only minor short-coming I see of that is the output file size can become very long for a large set of input genomes with the default line by line format.

Also, the "N/A" issue is well-handled if you request your output as matrix format (a feature available with latest release). In that case, all comparisons are reported. Can you try it out and see if it works okay and resolves your issue?

donovan-h-parks commented 6 years ago

The matrix format does work. Thanks for pointing this out.

donovan-h-parks commented 5 years ago

It would still be nice to have the standard output indicate NA for combinations that can't be computed. The matrix format doesn't have the alignment fraction information which is often useful.

Reynababy commented 2 years ago

excuse me.How is the problem solved?I seemly meeting the same problem that is missing result in output file.I can't find where are worry.Thanks for any assistance you can provide.