ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

no output file #21

Closed tiagofilipe12 closed 4 years ago

tiagofilipe12 commented 6 years ago

I tried to run fastANI with the following command:

fastANI -q fasta1.fas -r fasta2.fas -o fastani.out

I get the following output in STDOUT:

>>>>>>>>>>>>>>>>>>
Reference = [fasta1.fas]
Query = [fasta2.fas]
Kmer size = 16
Fragment length = 3000
ANI output file = fastani.out
>>>>>>>>>>>>>>>>>>
INFO, skch::Sketch::build, minimizers picked from reference = 3493
INFO, skch::Sketch::index, unique minimizers = 3363
INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 3263) ... (3, 30)
INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, consider all minimizers during lookup.
INFO, skch::main, Time spent sketching the reference : 0.0149605 sec
INFO, skch::main, Time spent mapping fragments in query #1 : 0.0218417 sec
INFO, skch::main, Time spent post mapping : 0.000101473 sec

However, the fastani.out doesn't get anything on it and it has 0 size when I try du -h.

I have tried release versions 1.0, 1.1 and even git clone. I can execute fastANI but the output file never gets written. The two sequences that I am testing have more or less 70kb and they should have a mash dist of <0.1, so they should be highly similar. Any ideas on what is going on?

Thanks

tiagofilipe12 commented 6 years ago

I have figured it out. I was attempting to make 50 fragment of 3000 bp length which are the default values, which gives us a total sequence length of 150000 bp and my sequences have 70000 bp. However, maybe the program should raise an exception when this happens.

cjain7 commented 6 years ago

Yes, you are right; thanks for the feedback. So far we assumed that inputs will be bacterial/archael genomes. Did you get expected output by lowering the count from 50?

I think we can put a warning message if we see such inputs.

tiagofilipe12 commented 6 years ago

I tried a smaller fragLen like 200 and it outputted results. I am attempting to use it for plasmid sequences, which will suffer a high variation in length comparing with genomes.

cjain7 commented 4 years ago

The latest version of FastANI resolves this by using a fraction length of input genomes rather than using a absolute cutoff.