ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
374 stars 67 forks source link

FastANI 1.33 using more threads than allocated #101

Closed aaronmussig closed 1 year ago

aaronmussig commented 2 years ago

Hello,

I came across an issue where FastANI version 1.33 is using more threads than were allocated (versions 1.3 to 1.32 seem to be okay). I've attached results of /usr/bin/time for all of those versions.

All versions were executed using:

/usr/bin/time -v fastANI -q GCA_019236045.1_ASM1923604v1_genomic.fna -r GCA_019236175.1_ASM1923617v1_genomic.fna -o output.txt -t 1 2> fastANI-1.x.txt

fastANI-1.3.txt fastANI-1.31.txt fastANI-1.32.txt fastANI-1.33.txt

Cheers!

zwets commented 1 year ago

@aaronmussig how did you obtain the FastANI versions? After some experimenting it looks to me like the problem isn't with FastANI itself, but with the FastANI that comes with bioconda.

I just tested with FastANI 1.32 and FastANI 1.33 built from source; neither have the issue. Nor does fastANI 1.33 in Ubuntu.

A regression between 1.32 and 1.33 would anyway be highly unlikely, as the only change in the source code is the upgrade of the argument parser (argvparser was swapped for clipp).

ldd on FastANI 1.33 in Ubuntu 22.04 gives:

ldd /usr/bin/fastANI
        linux-vdso.so.1 (0x00007ffc76db3000)
        libgsl.so.27 => /lib/x86_64-linux-gnu/libgsl.so.27 (0x00007f3aa2df7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3aa2bcd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f3aa2bb1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3aa2aca000)
        libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f3aa2a80000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3aa2a5e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3aa2836000)
        libgslcblas.so.0 => /lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007f3aa27f4000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3aa30ff000)

whereas the Conda-installed one (under /hpc/opt/conda/... at my site) shows various other libraries, including notably libpthread:

ldd /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/fastANI
        linux-vdso.so.1 (0x00007fff56f8b000)
        libgsl.so.25 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libgsl.so.25 (0x00007f032356a000)
        libopenblas.so.0 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libopenblas.so.0 (0x00007f03212c2000)
        libstdc++.so.6 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libstdc++.so.6 (0x00007f032110e000)
        libz.so.1 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libz.so.1 (0x00007f03210f4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0321002000)
        libgomp.so.1 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libgomp.so.1 (0x00007f0320fc7000)
        libgcc_s.so.1 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/libgcc_s.so.1 (0x00007f0320fae000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0320fa9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0320d81000)
        libgfortran.so.5 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/./libgfortran.so.5 (0x00007f0320bd6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f03238b0000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0320bcf000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0320bca000)
        libquadmath.so.0 => /hpc/opt/conda/envs/gtdbtk-2.2.2-test/bin/./../lib/././libquadmath.so.0 (0x00007f0320b90000)

Any idea where / how to influence the autoconf flags for FastANI in Bioconda? Can we report a bug somewhere? (I just use conda as a necessary evil, never looked behind the curtain.)

Would be great if we could squash this and this.

zwets commented 1 year ago

@cjain7 I think you can close this issue as being caused by libraries that Conda (possibly dynamically - note the librt and libdl dependencies) links to the FastANI code.

The FastANI code correctly sets omp_set_num_threads(parameters.threads), which then applies to the main pragma omp for loop in the code (and only there). What seems to be happening is that a backend library (I suspect openblas) then grabs a bunch of extra threads.

A workaround is available: setting environment variable OMP_NUM_THREADS=1 doesn't affect FastANI's -t parameter (as the omp_set_num_threads() call overrides the environment setting), but it does restrain the backend code.

cjain7 commented 1 year ago

Sure, thanks!