ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

FastANI stuck on last comparison #28

Closed igorspp closed 4 years ago

igorspp commented 5 years ago

I have tried running FastANI with a combination of MAGs and genomes from GenBank (1537 genomes in total). Everything appears to run fine but FastANI is stuck in the last comparison for a couple of hours now and no output is written:

INFO [thread 0], skch::main, Time spent mapping fragments in query #1535 : 3.45975 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.00237654 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1536 : 2.6996 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.00182084 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1537 : 3.23705 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.00220625 sec

I am running the compiled version for Linux (v.1.1) on a HPC server and the command used is shown below:

fastANI --ql phylogenomics_ANI_paths.txt \
        --rl phylogenomics_ANI_paths.txt \
        -o fastani.out \
        -t 4 \
        --matrix

Many thanks in advance for your thoughts on this.

Cheers, Igor

igorspp commented 5 years ago

PS: I have just ran a single comparison and it worked:

fastANI -q ../genomes/IN_Mac8_MAG01.fa \ -r ../genomes/GCA_000582685.fa \ -o fastani.out \ -t 4 \ --matrix

cjain7 commented 5 years ago

From the log, it seems like the last comparison finished, and it might be stuck while writing the output to file. It would be worth trying with lower genome count and checking that there is no issue with disk I/O.

wwood commented 5 years ago

Hi,

I'm running v1.1 and running into the same issues:

$ fastANI --ql all_good_genomes --rl all_good_genomes -o all_good_genomesVall_good_genomes.fastani -t 36
...
INFO [thread 0], skch::main, Time spent mapping fragments in query #1593 : 0.700507 sec
INFO [thread 0], skch::main, Time spent post mapping : 9.1878e-05 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1594 : 0.714483 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000104341 sec

There's 1594 genomes in the list, and it has been sitting like that (using ~900% CPU on top) for >12 hours. This doesn't seem like an IO issue to me.

When I attached strace to it while it is stalled I see this

Process 128989 attached - interrupt to quit
futex(0xc9c744, FUTEX_WAIT_PRIVATE, 0, NULL

Maybe this is a multithreading issue? I'll try again without specifying -t.

wwood commented 5 years ago

Running it again without -t finished without issue in an hour or two.

Then ran the same command as above and it never finished - it is using 1300% on top after running overnight, and is stuck on the last query as before.

So, this seems like a multithreading-related issue to me.

cjain7 commented 5 years ago

This is useful to know. I'll take a look at the code again.

If you don't mind, can you give me more information about the C++ compiler and OS you are using.. Did you download the executable or compile from source?

wwood commented 5 years ago

Thanks - the executable is the same as the downloaded version, and has the same md5sum. Sorry, that was a long way of saying it is the downloaded (Linux) version.

cjain7 commented 5 years ago

I've made a few minor changes, mainly to add more print statements to see the status of each thread. If you get a chance, can you try cloning the latest source code, compile and run.

If it still fails, please share the output/error logs with me.

cruizperez commented 5 years ago

Hi! I ran into the same problem using multiple threads. The program runs and then hangs in:

INFO [thread 0], skch::main, ready to exit the loop
INFO [thread 7], skch::main, ready to exit the loop
INFO [thread 3], skch::main, ready to exit the loop
INFO [thread 6], skch::main, ready to exit the loop
INFO [thread 2], skch::main, ready to exit the loop
INFO [thread 1], skch::main, ready to exit the loop

Running with just one thread finishes the job.

cjain7 commented 4 years ago

This should be resolved with recent releases, please create a new issue if not.