Closed wwood closed 4 years ago
I'm using the latest source code from master branch. At my end, I see the following behavior:
[jainc2@gry-compute050 small_seq]$ ../../fastANI -q seq1.fa -r seq2.fa -o /dev/stdout
>>>>>>>>>>>>>>>>>>
Reference = [seq2.fa]
Query = [seq1.fa]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = /dev/stdout
>>>>>>>>>>>>>>>>>>
INFO [thread 0], skch::main, Count of threads executing parallel_for : 1
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 1220
INFO [thread 0], skch::Sketch::index, unique minimizers = 1220
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 1220) ... (1, 1220)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 0.00153976 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 0.00268185 sec
INFO [thread 0], skch::main, Time spent post mapping : 7.12e-06 sec
INFO [thread 0], skch::main, ready to exit the loop
INFO, skch::main, parallel_for execution finished
seq1.fa seq2.fa 99.9739 3 4
Reversed:
[jainc2@gry-compute050 small_seq]$ ../../fastANI -r seq1.fa -q seq2.fa -o /dev/stdout
>>>>>>>>>>>>>>>>>>
Reference = [seq1.fa]
Query = [seq2.fa]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = /dev/stdout
>>>>>>>>>>>>>>>>>>
INFO [thread 0], skch::main, Count of threads executing parallel_for : 1
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 1122
INFO [thread 0], skch::Sketch::index, unique minimizers = 1122
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 1122) ... (1, 1122)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 0.00141419 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 0.00310046 sec
INFO [thread 0], skch::main, Time spent post mapping : 7.211e-06 sec
INFO [thread 0], skch::main, ready to exit the loop
INFO, skch::main, parallel_for execution finished
seq2.fa seq1.fa 99.9734 4 5
Note that ANI values are slightly different. This is expected (see #54).
Oops I didn't see that you are turning up the default --minFraction
The difference comes from the fact that shared genome length (e.g., 3 vs. 4 fragments) differs slightly in both cases. Again, this is due to asymmetry in FastANI, order of two genomes can lead to small differences.
One solution could be to generate output in .matrix
format using --matrix
parameter. That would give you consistent results.
OK, thanks, makes sense. I can rescue it by turning the fragLength to 1000bp, for instance.
Hi,
I'm running fastANI 1.3 from bioconda between 2 short sequences, each in their own file - see https://gist.github.com/wwood/d47dfa28748edcbfcbede4453d76cf7f
These sequences are nearly identical, except one is ~14kb while the other is ~15kb.
Strangely, if I turn up the
--minFraction
, the alignment of the first against the second gives no result, while the reverse does give the alignment:Any ideas? Thanks in advance. ben