Closed minjaekim45 closed 4 years ago
Hi Minjae, can you do some sanity check for the two genomes, e.g., their total lengths, N50 etc. Just wondering if this has anything to do with the quality of input genomes. I also recommend taking a look at the input parameters (available via -h
option), and see if any of them is useful in your context.
If that doesn't help, it would be best to share the two genomes with me.
Thanks! I will try to reduce the fragsize
On Fri, Jul 26, 2019 at 3:16 PM Chirag Jain notifications@github.com wrote:
Hi Minjae, can you do some sanity check for the two genomes, e.g., their total lengths, N50 etc. Just wondering if this has anything to do with the quality of input genomes. I also recommend taking a look at the input parameters (available via -h option), and see if any of them is useful in your context.
If that doesn't help, it would be best to share the two genomes with me.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ParBLiSS/FastANI/issues/49?email_source=notifications&email_token=ABSK2AXCDCB2HZ3QIDWVGW3QBNLRJA5CNFSM4IHF6HCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25TMIQ#issuecomment-515585570, or mute the thread https://github.com/notifications/unsubscribe-auth/ABSK2AT462BJUWITSDOCJ6LQBNLRJANCNFSM4IHF6HCA .
I also had the same problem with the two example files (E. coli and Shigella). I had tried to download the files from github - dont think what i got was a FASTA file, and thats what caused the run to fail?
It worked OK when i pulled the files from NCIMB
I can replicate this bug with these fastas (they have already been published on GenBank): https://cloud.roder.casa/s/cC2jMzWaNwrJCkM
$fastANI -q FAM17927.fna -r FAM19036.fna -o out1.txt
...
$fastANI -r FAM17927.fna -q FAM19036.fna -o out2.txt
...
$cat cat out*
$
stats for FAM17927.fna
sum = 2757319, n = 49, ave = 56271.82, largest = 391390
N50 = 234031, n = 5
N60 = 230308, n = 6
N70 = 145185, n = 8
N80 = 76996, n = 10
N90 = 37143, n = 15
N100 = 305, n = 49
N_count = 7
Gaps = 7
Playing around with --fragLen
didn't help. I also tried removing smaller scaffolds. Does it work on the computer FastANI was compiled on?
Recompiling didn't help either.
@cjain7 - Sorry to bother you with this question, but I have a strategic decision to make about my own software. I'd like to integrate your tool because it's better and faster than the alternatives, but it has to work consistently.
Can you give me an approximate idea how long it will take you to fix this issue? (Can you reproduce the error with the files I provided?)
@MrTomRod Apologies for the delay in responding.
It appears that the genomes that you are trying to compare are too divergent... where as FastANI is designed for genome comparisons at ~80% or more identity.
I ended up checking what is the AAI (identity at amino-acid level) but looks like these two genomes are not comparable at protein-level too..
[jainc2@gry-compute050 MrTomRod]$ ../../Utility/enveomics/Scripts/aai.rb -N -1 FAM17927.fna -2 FAM19036.fna
Temporal directory: /tmp/d20200529-27816-mid2x0.
Creating databases.
Reading FastA file: FAM17927.fna
File contains 49 sequences.
Reading FastA file: FAM19036.fna
File contains 1 sequences.
Running one-way comparisons.
Insuffient hits to estimate one-way AAI: 1.
Insuffient hits to estimate one-way AAI: 1.
Insufficient hits to estimate two-way AAI: 1
[jainc2@gry-compute050 MrTomRod]$ ../../Utility/enveomics/Scripts/ani.rb -1 FAM17927.fna -2 FAM19036.fna
Temporal directory: /tmp/d20200529-27839-no6j5s.
Creating databases.
Reading FastA file: FAM17927.fna
Created 13581 fragments from 49 sequences, discarded 40814 bp.
Reading FastA file: FAM19036.fna
Created 18167 fragments from 1 sequences, discarded 0 bp.
Running one-way comparisons.
Insuffient hits to estimate one-way ANI: 9.
Insuffient hits to estimate one-way ANI: 36.
Insufficient hits to estimate two-way ANI: 4
If you wish to run the above comparison at your end, you are welcome to download the code here https://github.com/lmrodriguezr/enveomics/tree/master/Scripts
The above ANI / AAI scripts use BLAST (they are slower than FastANI, but may be more accurate)
thanks a lot for the quick response!
Hi Chirag I just tried to run fastani both v1.1 and 1.2 with two genomes in data folder (e coli and shigella) and output file is empty and this is the log I got.