ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

fastANI process is aborted or killed. #39

Closed microDM closed 4 years ago

microDM commented 5 years ago

When I am comparing ~10,000 genomes, on my server having 24 cores with 128 GB RAM, the fastANI process is getting aborted. terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped) What will be memory requirements when I want to compare ~10,000 genomes?

I am using fastANI v1.1 My command: fastANI --refList reference.list --ql query.list -t 10 -o out --matrix

nigiord commented 5 years ago

I had the same problem when I tried to compare 6000-to-6000 genomes. I ended up batching my analysis into 6000 x 1-to-6000 comparisons, then concatenating the results. If I remember well each 1-to-6000 analysis (1 query, 6000 refs) required ~80 GB of memory with default parameters. This also allowed to parallelize the analysis on a big cluster.

The resulting tsv is pretty easy to parse so you can easily reconstruct the --matrix output afterwards.

microDM commented 5 years ago

Thanks @nigiord

I think it will be possible. I will give it a try.

cjain7 commented 4 years ago

Please create a new issue if this was not resolved.

jianshu93 commented 3 years ago

For a collection of 100,000 reference genomes, a 700 GB memory will be needed at least.