ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

Threads not being utilised? #116

Closed bananabenana closed 1 year ago

bananabenana commented 1 year ago

Hi,

First off, FastANI is amazing and a significant contribution. I am having an issue with FastANI when supplied with say, 12 threads not utilising them? thread 0 is the printed log at every line, does this mean only a single thread is being used?

For background, I am analysing a split reference of 5 genomes (total 6.6k) and a query of 6.6k genomes. My computational nodes have 48 threads so I know parallel processing is possible on my end as I use this for other tasks.

Command:

fastANI --ql list.txt --rl split_list.txt -o output.tsv -t 12 --fragLen 3000

Log:

>>>>>>>>>>>>>>>>>>
Reference = [5 files] # I edited this line for brevity
Query = [6679 files] # I edited this line for brevity
Kmer size = 16
Fragment length = 3000
Threads = 12
ANI output file = Kp_0001_matrix_comp.tsv
>>>>>>>>>>>>>>>>>>
INFO [thread 0], skch::main, Count of threads executing parallel_for : 12
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling  = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 440296
INFO [thread 0], skch::Sketch::index, unique minimizers = 429689
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 423018) ... (21, 1)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 0.596725 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 1.44481 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000202898 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #2 : 1.40884 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000231584 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #3 : 1.39151 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000207958 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #4 : 1.35467 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000192637 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #5 : 1.34069 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000246398 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #6 : 1.31763 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000209949 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #7 : 1.30368 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.00023698 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #8 : 1.20953 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000199597 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #9 : 1.23181 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000178598 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #10 : 1.23475 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000179452 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #11 : 1.25766 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.000172417 sec

Thanks

broke-why commented 1 year ago

I am facing the same problem. Only thread 0 printed in log file.

broke-why commented 1 year ago

I am facing the same problem. Only thread 0 printed in log file.

cjain7 commented 1 year ago

Hi, Please don't judge the parallel processing from output log. If you see the code, I am only printing the messages from thread 0 to track progress. The reason was that the output log would become difficult to understand if each thread is allowed to print concurrently.

https://github.com/ParBLiSS/FastANI/blob/c7fa42ba48ead187551331489290c095c02cb85b/src/cgi/core_genome_identity.cpp#L70

To check if parallel processing is happening, you can do so by checking CPU usage while running FastANI. Commands like top and htop can help you monitor CPU % being used by each task.

bananabenana commented 1 year ago

Okay makes sense. Thanks!