ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

FastANI still not fast enough? #10

Closed biofuture closed 6 years ago

biofuture commented 6 years ago

Dear Sir

I tried your fastANI to generate ANI to about 2000 genomes; the speed is quite slow. I run the program on a super node with 64 cores and 500 Gb memory. The software can only run in one single thread.

I know that you have already supplied a script to split genomes into smaller parts. But in one node, the speed is limited by the IO transfer if I run it parallel in one hard disk.

How did you generate the ANI among 80000 genomes? Can you give me some hint?

I tried to run it on our HPCF; however, for every single run, the memory requirements exceed 96 Gbs which is the configuration in most of our node.

I can only submit limited jobs (10) at one time, so I can just split the total jobs into less than 100 jobs rather than 1000 of jobs.

Thank you very much!

Xiaotao

cjain7 commented 6 years ago

Generating ANI for 2000 genomes should be pretty quick and should take less than 96G. In my latest run with 8000 genomes, FastANI used about 60G memory.

Could you double check your scripts that split the reference DB and call FastANI? Also see #6 .