Closed SwapnilDoijad closed 6 years ago
std::bad_alloc
implies FastANI could not allocate memory when it needed. For 1000 microbial genomes, I expect the memory usage to be much below 66G. Please answer few follow up questions here:
What is the total size of all genomes you have in in 1000_genomes.list? I wonder if it is too big.
Can you provide memory usage of above run? It can be easily obtained by using the /usr/bin/time
utility:
/usr/bin/time fastANI --ql 1000_genomes.list --rl 1000_genomes.list -o output.txt
Please make sure you are not running other memory intensive tasks on your system while doing this.
--rl
) 1000_genomes.list into two lists of 500 genomes to reduce the memory use and run them one by one as:fastANI --ql 1000_genomes.list --rl 500_genomes_first.list -o output_1.txt
fastANI --ql 1000_genomes.list --rl 500_genomes_second.list -o output_2.txt
cat output_1.txt output_2.txt > output.txt
1000_genomes.list contains a list of 1000 genomes, each 3 Mb.
After closing all other programs
(A) for 1000 genome
$ /usr/bin/time fastANI --ql 1000_genomes.list --rl 1000_genomes.list -o output.txt Reference = [1.fasta, 2.fasta, ......... 1000.fasta] Query = [1.fasta, 2.fasta, ......... 1000.fasta] Kmer size = 16 Fragment length = 3000 ANI output file = output.txt INFO, skch::Sketch::build, minimizers picked from reference = 305245397 INFO, skch::Sketch::index, unique minimizers = 7172419 INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 3440253) ... (529726, 1) INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 2858 times during lookup. INFO, skch::main, Time spent sketching the reference : 286.582 sec INFO, skch::main, Time spent mapping fragments in query #1 : 412.583 sec INFO, skch::main, Time spent post mapping : 20.5822 sec Command terminated by signal 11 648.89user 7.07system 37:54.26elapsed 28%CPU (0avgtext+0avgdata 7770108maxresident)k 0inputs+0outputs (0major+979813minor)pagefaults 0swaps
(B) For 100 genome, each 3 Mb, output is..
successful run
and
7700.57user 1.39system 2:08:32elapsed 99%CPU (0avgtext+0avgdata 1661348maxresident)k 0inputs+3360outputs (0major+438435minor)pagefaults 0swaps
Thanks for sharing the info. I tried creating a custom dataset of 1000 E coli genomes at my end but could not reproduce above issue. Let me know if the data you are using is public.
fastaANI ran properly with 100 genomes. However, increased to 1000 genomes resulted in the following error
Error details:
$ fastANI --ql 1000_genomes.list --rl 1000_genomes.list -o output.txt
Hardware details
Processor | 8x Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Memory | 65858MB Operating System | Ubuntu 16.04.3 LTS