broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.7k stars 591 forks source link

libgkl on ppc64le fails to load #6794

Closed R-obert closed 4 years ago

R-obert commented 4 years ago

Hello,

I'm trying to use GATK4 (4.1.8.1) on an Ubuntu (16.04) machine. The machine is a "PowerLinux" machine and I'm guessing that the most relevant info for the following problem is that it is a ppc64le system. When I use HaplotypeCaller, I see the following messages on the screen:

Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50G -jar /home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar HaplotypeCaller -R ref.fa -I mybam.bam -O mycalls.vcf.gz -L snps.vcf -ip 100

16:17:04.377 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

16:17:04.397 **WARN**  NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (/tmp/libgkl_compression3825249225068031371.so: /tmp/libgkl_compression3825249225068031371.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:04.402 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

16:17:04.407 **WARN**  NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (/tmp/libgkl_compression7506152962158874866.so: /tmp/libgkl_compression7506152962158874866.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

Sep 04, 2020 4:17:05 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

INFO: Failed to detect whether we are running on Google Compute Engine.

16:17:05.842 INFO  HaplotypeCaller - ------------------------------------------------------------

16:17:05.843 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.8.1

16:17:05.843 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/

16:17:05.843 INFO  HaplotypeCaller - Executing as robert@powerlinux on Linux v4.4.0-184-generic ppc64le

16:17:05.843 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_252-8u252-b09-1~16.04-b09

16:17:05.843 INFO  HaplotypeCaller - Start Date/Time: September 4, 2020 4:17:04 PM UTC

16:17:05.843 INFO  HaplotypeCaller - ------------------------------------------------------------

16:17:05.843 INFO  HaplotypeCaller - ------------------------------------------------------------

16:17:05.844 INFO  HaplotypeCaller - HTSJDK Version: 2.23.0

16:17:05.844 INFO  HaplotypeCaller - Picard Version: 2.22.8

16:17:05.844 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2

16:17:05.844 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

16:17:05.844 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

16:17:05.844 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

16:17:05.844 INFO  HaplotypeCaller - Deflater: JdkDeflater

16:17:05.844 INFO  HaplotypeCaller - Inflater: JdkInflater

16:17:05.844 INFO  HaplotypeCaller - GCS max retries/reopens: 20

16:17:05.844 INFO  HaplotypeCaller - Requester pays: disabled

16:17:05.845 INFO  HaplotypeCaller - Initializing engine

16:17:05.928 WARN  IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater

16:17:05.932 WARN  IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater

16:17:06.503 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/robert/test/snps.vcf

16:17:06.539 INFO  IntervalArgumentCollection - Processing 61464 bp from intervals

16:17:06.551 INFO  HaplotypeCaller - Done initializing engine

16:17:06.573 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output

16:17:06.588 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_utils.so

16:17:06.589 **WARN**  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils347167544598047196.so: /tmp/libgkl_utils347167544598047196.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:06.589 **WARN**  IntelPairHmm - Intel GKL Utils not loaded

16:17:06.589 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported

16:17:06.589 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_utils.so

16:17:06.590 **WARN**  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils6186849302609329058.so: /tmp/libgkl_utils6186849302609329058.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:06.590 **WARN**  IntelPairHmm - Intel GKL Utils not loaded

16:17:06.591 **WARN**  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!

Since the calculation takes quite long, I checked the WARN messages of the output above. Especially the last one about the AVX instruction set where it says that a MUCH slower implementation will be used. From the few WARN messages it seems like the root cause is the failure to load libgkl and that again seems to be related to my platform. From another thread/topic I concluded that the instruction set problem might be gone if libgkl could be loaded. Does anyone know more about this issue or how to work around it?

Best regards, Robert

dcrookston commented 4 years ago

Disclaimer: I'm a programmer with very little GATK experience.

The problem is that you've got AMD libraries on a PowerPC machine. I don't know if GATK makes PowerPC libraries available natively, but you should be able to get the source code and compile it yourself.

Note that this will not fix the problem of your machine architecture lacking the AVX instruction set. That's a hardware issue. But it should (okay, might) get rid of the warnings about missing .so files.

As an aside, I'm curious whether PowerPC architecture has an instruction set similar to AVX. This is something I might actually be able to contribute to the project so I'm excited by the prospect!

-Dan

On Fri, Sep 4, 2020, 11:53 AM R-obert notifications@github.com wrote:

Hello,

I'm trying to use GATK4 (4.1.8.1) on an Ubuntu (16.04) machine. The machine is a "PowerLinux" machine and I'm guessing that the most relevant info for the following problem is that it is a ppc64le system. When I use HaplotypeCaller, I see the following messages on the screen:

Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50G -jar /home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar HaplotypeCaller -R ref.fa -I mybam.bam -O mycalls.vcf.gz -L snps.vcf -ip 100

16:17:04.377 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

16:17:04.397 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (/tmp/libgkl_compression3825249225068031371.so: /tmp/libgkl_compression3825249225068031371.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:04.402 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

16:17:04.407 WARN NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (/tmp/libgkl_compression7506152962158874866.so: /tmp/libgkl_compression7506152962158874866.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

Sep 04, 2020 4:17:05 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

INFO: Failed to detect whether we are running on Google Compute Engine.

16:17:05.842 INFO HaplotypeCaller - ------------------------------------------------------------

16:17:05.843 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.8.1

16:17:05.843 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/

16:17:05.843 INFO HaplotypeCaller - Executing as robert@powerlinux on Linux v4.4.0-184-generic ppc64le

16:17:05.843 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_252-8u252-b09-1~16.04-b09

16:17:05.843 INFO HaplotypeCaller - Start Date/Time: September 4, 2020 4:17:04 PM UTC

16:17:05.843 INFO HaplotypeCaller - ------------------------------------------------------------

16:17:05.843 INFO HaplotypeCaller - ------------------------------------------------------------

16:17:05.844 INFO HaplotypeCaller - HTSJDK Version: 2.23.0

16:17:05.844 INFO HaplotypeCaller - Picard Version: 2.22.8

16:17:05.844 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2

16:17:05.844 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

16:17:05.844 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

16:17:05.844 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

16:17:05.844 INFO HaplotypeCaller - Deflater: JdkDeflater

16:17:05.844 INFO HaplotypeCaller - Inflater: JdkInflater

16:17:05.844 INFO HaplotypeCaller - GCS max retries/reopens: 20

16:17:05.844 INFO HaplotypeCaller - Requester pays: disabled

16:17:05.845 INFO HaplotypeCaller - Initializing engine

16:17:05.928 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater

16:17:05.932 WARN IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater

16:17:06.503 INFO FeatureManager - Using codec VCFCodec to read file file:///home/robert/test/snps.vcf

16:17:06.539 INFO IntervalArgumentCollection - Processing 61464 bp from intervals

16:17:06.551 INFO HaplotypeCaller - Done initializing engine

16:17:06.573 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output

16:17:06.588 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_utils.so

16:17:06.589 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils347167544598047196.so: /tmp/libgkl_utils347167544598047196.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:06.589 WARN IntelPairHmm - Intel GKL Utils not loaded

16:17:06.589 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported

16:17:06.589 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/robert/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_utils.so

16:17:06.590 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils6186849302609329058.so: /tmp/libgkl_utils6186849302609329058.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64 LE-bit platform))

16:17:06.590 WARN IntelPairHmm - Intel GKL Utils not loaded

16:17:06.591 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!

Since the calculation takes quite long, I checked the WARN messages of the output above. Especially the last one about the AVX instruction set where it says that a MUCH slower implementation will be used. From the few WARN messages it seems like the root cause is the failure to load libgkl and that again seems to be related to my platform. Does anyone know more about this issue or how to work around it?

Best regards, Robert

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/6794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKOLMAQSKLJ5N7SHDOJQ7DSEESQFANCNFSM4QZASPYQ .

R-obert commented 4 years ago

My knowledge about .so files, linkers and architectures is very limited but to me it sounds like the GKL library is part of the jar and GATK tries to load it from there. And reading this thread it sounds like the GKL library of GATK should work on ppc. Because of the "no such file or directory" messages (in the first warnings) I have also tried to point the --tmp-dir parameter of HaplotypeCaller to my home directory to make sure that it's not just a permission problem.

R-obert commented 4 years ago

What this user describes seems to be what is happening (or supposed to happen): GATK extracts the .so file to a tmp directory and then uses System.load("/path/to/lib.so") to load the library. Is it possible that something inside the GATK package was not cross-compiled / is misconfigured like in this case?

droazen commented 4 years ago

@R-obert Sorry, but the GKL library does not support PowerPC, only AMD64. The good news is that even with this warning the HaplotypeCaller will still run fine -- it will just fall back to using the slower Java implementations of algorithms like the PairHMM rather than the hardware-accelerated versions.

There was an effort a number of years ago by an IBM developer (@frank-y-liu) to make a PowerPC build of the library, but I'm not sure what's become of that code or whether it's still maintained. You could try opening a ticket in the GKL repository (https://github.com/Intel-HLS/GKL) to inquire about PowerPC support.