ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

getting no output for genome files #1

Closed jotech closed 6 years ago

jotech commented 6 years ago

Hi, many thanks for this new ANI implementation! I'm looking forward to use it but for this two genomes I'm getting no output at all. The output file is empty. (fasta.zip)

fastANI -q ./MYb12.fasta -r ./BT247.fasta -o test
>>>>>>>>>>>>>>>>>>
Reference = [./BT247.fasta]
Query = [./MYb12.fasta]
Kmer size = 16
Fragment length = 3000
ANI output file = test
>>>>>>>>>>>>>>>>>>
INFO, skch::Sketch::build, minimizers picked from reference = 448286
INFO, skch::Sketch::index, unique minimizers = 440286
INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 434525) ... (74, 1)
INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 74 times during lookup.
INFO, skch::main, Time spent sketching the reference : 0.5524 sec
INFO, skch::main, Time spent mapping fragments in query #1 : 0.507377 sec
INFO, skch::main, Time spent post mapping : 2.7297e-05 sec

From the output I could not guess what going wrong. The input files are draft genomes but from a rough quality check they should be fine, e.g. N50>10000.

kitchWWW commented 6 years ago

Also noticing a similar issue. Thanks in advance for any help!

cjain7 commented 6 years ago

Hi, FastANI reports ANI for genomes related within ~80 - 100% nucleotide identity range. Otherwise no output is given.

Can you try using alternative implementation and please confirm me if this is the case? Or alternatively share the input genomes here.

For diverge genome sequences outside 80-100 ANI range, computing identity at amino acid level is recommended in the literature.

I will revise documentation to avoid this misunderstanding :)

cjain7 commented 6 years ago

@jotech I notice the data you gave me. I confirmed at my end that the two genomes lack 80-100% ANI.

I tried Blast-based ANI and got following output

$ ruby enveomics/Scripts/ani.rb -1 BT247.fasta -2 MYb12.fasta
Insuffient hits to estimate one-way ANI: 9.
Insuffient hits to estimate one-way ANI: 15.
Insufficient hits to estimate two-way ANI: 6

Will make this clear in README.

jotech commented 6 years ago

Thank you so much, this makes sense! I'm now having a look on amino acid identity (aai from enveomics). Do you know whether the values from ani and aai are comparable? So could I use ani and for lower identity aai to get a consistent phylogenetic distance?

cjain7 commented 6 years ago

Do you know whether the values from ani and aai are comparable? Yes, I believe they are.

Quoting from this paper Bypassing Cultivation To Identify Bacterial Species: The analysis shows that ANI offers robust resolution between genomes that share 80–100% ANI, i.e., within species or among closely related species, and that species that share less than 80% ANI and/or 30% of their gene content are too divergent to be compared based on the ANI measurement. For the latter genomes, AAI provides a much more robust resolution and should be used instead.

jotech commented 6 years ago

just came back. Thanks for this clarification!

diaz13 commented 6 years ago

Hi I run this command : ./fastANI --ql ../1.fasta --rl ../2.fasta -o output I obtain Threads = 1 ANI output file = output

ERROR, skch::validateInputFiles, Could not open >Scaffold_1

I don't no why he could not open the file 1 ?

cjain7 commented 6 years ago

@diaz13 , it seems like you wish to do one-to-one comparison, for that you should be running ./fastANI -q ../1.fasta -r ../2.fasta -o output

diaz13 commented 6 years ago

@cjain7, thank you. I had not paid attention. I have a list of 400 genomes but it's good to run , for now no problem.