broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

gcc should be listed in the README as required dependency for running GATK #6012

Open TedBrookings opened 5 years ago

TedBrookings commented 5 years ago

Documentation request

Description

I propose that installation of gcc be added to the instructions on the GATK Github README.md

If gcc is not installed, HaplotypeCaller complains that the AVX instruction set is not available, even when it is. It falls back to slower LOGLESS_CACHING PairHMM. The fault is missing libgomp1, which is a required dependency of gcc.

Since this documentation request is related to a "bug" that comes about from not installing necessary libraries, I'll include the bug report format below, in case someone else searches for solutions to this problem, as suggested by @lbergelson

Affected tool(s) or class(es)

HaplotypeCaller, or any other tool that uses PairHMM

Affected version(s)

-I think all as of 2019-06-20. I tested on release version 4.1.2.0

Steps to reproduce

Run HaplotypeCaller from a released jar on an Ubuntu VM that supports the AVX instruction set. Critically, do NOT install gcc on the VM. Installing gcc fixes this problem.

Expected behavior

If you install gcc, that results in the installation of libgomp1, which allows the Intel library to load and use AVX acceleration. You could probably install libgomp1 on its own, but I did not test that.

14:51:01.013 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/ubuntu/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 14:51:01.015 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/ubuntu/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so 14:51:01.053 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions 14:51:01.053 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM 14:51:01.054 INFO IntelPairHmm - Available threads: 16 14:51:01.054 INFO IntelPairHmm - Requested threads: 8 14:51:01.054 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation

Actual behavior

Without libgomp1, AVX acceleration doesn't work:

19:43:36.387 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/ubuntu/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 19:43:36.389 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils5391341743604217103.so: libgomp.so.1: cannot open shared object file: No such file or directory) 19:43:36.389 WARN IntelPairHmm - Intel GKL Utils not loaded 19:43:36.389 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported 19:43:36.389 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/ubuntu/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 19:43:36.390 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils3484179251394006588.so: libgomp.so.1: cannot open shared object file: No such file or directory) 19:43:36.390 WARN IntelPairHmm - Intel GKL Utils not loaded 19:43:36.390 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!

scalavision commented 5 years ago

If you wrap the gatk python script with something like:

#! /bin/bash -e
export LD_LIBRARY_PATH='<path to gcc-7.4.0-lib>/lib'${LD_LIBRARY_PATH:+':'}$LD_LIBRARY_PATH
exec "gatk" "$@"

It should work

yonniejon commented 1 year ago

I have gcc installed and have also tried the suggestion that @scalavision has suggested and still I get the warning "Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!".

I am not sure that I used the correct path. I found the path to gcc 7.4.0 path by running gcc -print-prog-name=cc1 -v