broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

Error in smith waterman native library #5690

Open byoo opened 5 years ago

byoo commented 5 years ago

Mutect2 (GATK 4.1.0.0) fails occasionally in smith waterman native library as below. stderr is attached. I can also provide core dump if necessary.

stderr.tar.gz

07:30:59.335 INFO  ProgressMeter -          17:78451657            627.7               1223980           1950.0
*** Error in `java': munmap_chunk(): invalid pointer: 0x00002ba8e50b7740 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7ab54)[0x2ba8df926b54]
/gpfs/data/software/cromwell/log/cromwell-executions/Mutect2/2cebc7be-fe23-4787-9095-9b91227c6526/call-M2/shard-13/attempt-2/tmp.945f1f83/libgkl_smithwaterman5575294852416409537.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x2ba9aee21fa8]
/gpfs/data/software/cromwell/log/cromwell-executions/Mutect2/2cebc7be-fe23-4787-9095-9b91227c6526/call-M2/shard-13/attempt-2/tmp.945f1f83/libgkl_smithwaterman5575294852416409537.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x2ba9aee21bf8]
[0x2ba8e8f6675a]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:03 5769910                            /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/jre/bin/java
00600000-00601000 r--p 00000000 08:03 5769910                            /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/jre/bin/java
...
lbergelson commented 5 years ago

@byoo A core dump would be useful I believe.

byoo commented 5 years ago

@lbergelson The file is about 1GB in size. Do you have a preferred way to share the file?

lbergelson commented 5 years ago

@byoo The easiest thing would be if you can upload it to google cloud and make it publicly visible. Then we can copy it over and you can delete it. Or if you can share your google account name I can grant you upload permission on a bucket we own. (If you want to not publish it to the world you can email it to me louisb@broadinstitute.org )

Alternatively, if you can't use google cloud, you could upload it to the gatk ftp site. See this article here about how to connect to upload: https://gatkforums.broadinstitute.org/gatk/discussion/1215/how-can-i-access-the-gsa-public-ftp-server.

lbergelson commented 5 years ago

@gspowley Could you route this bug report to whoever is able to deal with this nowadays?

byoo commented 5 years ago

@lbergelson Would you grant a upload permission on a bucket of yours? My google account is byunggil.yoo@gmail.com Thank you.

lbergelson commented 5 years ago

@byoo I believe I've granted you read/write permission of gs://hellbender-drop-box. I'm always confused by google storage permissions though, so let me know if it didn't work.

gspowley commented 5 years ago

@mepowers Can you please take a look at this issue?

byoo commented 5 years ago

@lbergelson It worked well - I have uploaded a core dump. Let me know if you need something more.

rpomaris commented 5 years ago

@gspowley - got it. All - I'm the new Intel contact for GKL issues. It sounds like this is resolved, but please feel free to tag me if you need help.

lbergelson commented 5 years ago

@mepowers Nice to meet you.

This issue isn't resolved. What was resolved was uploading a core dump that exhibits the problem. Is it possible for you to take a look into what's the causing the invalid pointer? Let us know what additional information we can provide.

The core dump is located at gs://hellbender/bugs/5690/core.tar.gz and should be publicly accessible.

rpomaris commented 5 years ago

Thanks @lbergelson - nice to meet you too.

Sorry for the delay here. I had to set up gsutils on my system and am having gdb issues.

Submitting sudo gdb /nfsdata-tmp/tools/gatk /home/bduser/mepowers/core.114856 I get back

Missing separate debuginfo for the main executable file
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/6c/../../../jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/java
Core was generated by `java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samt'.
Program terminated with signal 6, Aborted.

I did try the yum --enablerepo, but it am getting the same error.

Any quick workarounds? Thanks in advance for the help. Will try again on Monday.

flexray commented 5 years ago

The same for HaplotypeCaller (GATK 4.1.1.0) If this persists, and it seems like it, I will try to switch to pure-java, as retry takes couple of hours and ruins the workflow.

--smith-waterman / NA
Which Smith-Waterman implementation to use, generally FASTEST_AVAILABLE is the right choice
The --smith-waterman argument is an enumerated type (Implementation), which can have one of the following values:

FASTEST_AVAILABLE
use the fastest available Smith-Waterman aligner that runs on your hardware
AVX_ENABLED
use the AVX enabled Smith-Waterman aligner
JAVA
use the pure java implementation of Smith-Waterman, works on all hardware
*** Error in `java': munmap_chunk(): invalid pointer: 0x00007fafc8ed1000 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fafce3f37e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7fafce400698]
/home/flexray/germline/cromwell-executions/PairedEndSingleSampleWorkflow/af1ee082-8661-4a7a-adf9-1b2a67333d37/call-HaplotypeCaller/shard-40/tmp.42584bbe/libgkl_smithwaterman205796788520033039.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7faf73bfcfa8]
/home/flexray/germline/cromwell-executions/PairedEndSingleSampleWorkflow/af1ee082-8661-4a7a-adf9-1b2a67333d37/call-HaplotypeCaller/shard-40/tmp.42584bbe/libgkl_smithwaterman205796788520033039.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7faf73bfcbf8]
[0x7fafb9a7eea2]

and

*** Error in `java': double free or corruption (out): 0x00007f933d610780 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f93434427e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f934344b37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f934344f53c]
/home/flexray/germline/cromwell-executions/PairedEndSingleSampleWorkflow/fa8e6a15-021e-48cc-9429-c53596fc9c29/call-HaplotypeCaller/shard-19/tmp.ea81c1bd/libgkl_smithwaterman4419442010051805328.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7f9248e4bfa8]
/home/flexray/germline/cromwell-executions/PairedEndSingleSampleWorkflow/fa8e6a15-021e-48cc-9429-c53596fc9c29/call-HaplotypeCaller/shard-19/tmp.ea81c1bd/libgkl_smithwaterman4419442010051805328.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7f9248e4bbf8]
[0x7f932de9ceaa]
rpomaris commented 5 years ago

@byoo @flexray - we discussed this internally as a team, and the recommendation is to use the standard smithwaterman, as @flexray suggests, and see if that resolves the issue. You should see comparable runtimes between Intel smith waterman and the default version.

flexray commented 5 years ago

@mepowers going with --smith-waterman JAVA worked and HC finished for WGS sample

lbergelson commented 5 years ago

@mepowers Sorry for the long gap, I got distracted and then went on paternity leave. I'm not really able to help with gdb, I don't really have much experience there.

It looks like this is double free error in the Intel smith waterman code. Have you had any luck hunting it down?

DCarbonez commented 4 years ago

This issue also pops up during FilterAlignmentArtifacts in GATK 4.1.7.0 (experimental)

*** Error in `java': munmap_chunk(): invalid pointer: 0x00007fb4c8e6e540 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb4cdfee7e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7fb4cdffb698]
/cromwell_root/tmp.be1fb8a9/libgkl_smithwaterman4505316410124989699.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7fb4ac3cffa8]
/cromwell_root/tmp.be1fb8a9/libgkl_smithwaterman4505316410124989699.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7fb4ac3cfbf8]
[0x7fb4b8b95f92]
lbergelson commented 4 years ago

@DCarbonez Thanks for reporting. A team from intel has recently started looking into some GKL issues. I've forwarded your stack trace to them. Is this a reproducible error? Can you provide any additional information about your system that might help debug?

DCarbonez commented 4 years ago

Dear @lbergelson,

This error occurred during the wdl implementation of gatk4-somatic-snvs-indels (https://github.com/gatk-workflows/gatk4-somatic-snvs-indels/blob/master/mutect2.wdl). Each of these steps were run on a fresh cloud instance with 9 GB ram & 2 cpu. (default).

This is the underlying command:

         set -e

        export GATK_LOCAL_JAR=~{default="/root/gatk.jar" runtime_params.gatk_override}

        gatk --java-options "-Xmx~{command_mem}m" FilterAlignmentArtifacts \
            -V ~{input_vcf} \
            -I ~{bam} \
            --bwa-mem-index-image ~{realignment_index_bundle} \
            ~{realignment_extra_args} \
            -O ~{output_vcf}

As it failed repeatedly, it reran 19 times:

The respective backtraces:

*** Error in `java': double free or corruption (out): 0x00007f6364699340 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f636ba307e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f636ba3937a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f636ba3d53c]
/cromwell_root/tmp.7626fbcf/libgkl_smithwaterman1454827346682980108.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7f63123c8fa8]
/cromwell_root/tmp.7626fbcf/libgkl_smithwaterman1454827346682980108.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7f63123c8bf8]
[0x7f6355bff192]
*** Error in `java': munmap_chunk(): invalid pointer: 0x00007f685d06c840 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f68634c37e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7f68634d0698]
/cromwell_root/tmp.4eeeda3c/libgkl_smithwaterman7538158038428947321.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7f6830cf2fa8]
/cromwell_root/tmp.4eeeda3c/libgkl_smithwaterman7538158038428947321.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7f6830cf2bf8]
[0x7f684dc31f92]

In each of these occurrences, the filtered vcf file was produced, but the vcf.idx file was missing.

Although the java errors occur, the last line of the log denotes the step as a success: (This might be true, but only when the option --create-output-variant-index is set to false. SetOperationStatus(copied 0 file(s) to <destinations_folder> succeeded"

I also performed a test based on machine type. (outside of the full workflow, starting the steps on my own on a separate instance & replicating the steps of the workflow)

lbergelson commented 4 years ago

Thank you for the additional information! Hopefully this will be helpful for the team tracking down these issues. There's going to be a new build of the GKL soon which I'm hoping will fix this.

slw287r commented 4 years ago

Hi @lbergelson,

We experienced the related issue in GATK 4.1.8 (it persisted since 4.1.5 or early version as far as we know) when running FilterAlignmentArtifacts in one of our cluster but not the other. We narrowed down the issue, using the CPU differences (the working one does not support AVX2), to libgkl_smithwaterman.so. Paths are shortened for clarity in the following commands.

bash faa.sh 
Using GATK jar /app/gatk-package-4.1.8.0-local.jar
Running:
    /bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /app/gatk-package-4.1.8.0-local.jar FilterAlignmentArtifacts -V /output/sample.FilterMutectCalls.vcf.gz -R /db/hs37d5.fa --bwa-mem-index-image /db/hg38.fa.img -I /output/sample.Mutect2.bam -O sample.somatic_filter.test.vcf.gz --use-jdk-inflater true
19:11:56.929 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/app/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
19:11:56.943 INFO  NativeLibraryLoader - Loading libgkl_smithwaterman.so from jar:file:/app/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_smithwaterman.so
19:11:56.944 INFO  SmithWatermanAligner - Using AVX accelerated SmithWaterman implementation
19:11:57.168 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/app/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 19, 2020 7:11:57 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:11:57.324 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
19:11:57.324 INFO  FilterAlignmentArtifacts - The Genome Analysis Toolkit (GATK) v4.1.8.0
19:11:57.325 INFO  FilterAlignmentArtifacts - For support and documentation go to https://software.broadinstitute.org/gatk/
19:11:57.325 INFO  FilterAlignmentArtifacts - Executing as foo@bar.local on Linux v2.6.32-696.6.3.el6.x86_64 amd64
19:11:57.325 INFO  FilterAlignmentArtifacts - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_261-b12
19:11:57.325 INFO  FilterAlignmentArtifacts - Start Date/Time: July 19, 2020 7:11:57 PM CST
19:11:57.325 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
19:11:57.325 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
19:11:57.325 INFO  FilterAlignmentArtifacts - HTSJDK Version: 2.22.0
19:11:57.325 INFO  FilterAlignmentArtifacts - Picard Version: 2.22.8
19:11:57.325 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:11:57.325 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:11:57.325 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:11:57.326 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:11:57.326 INFO  FilterAlignmentArtifacts - Deflater: IntelDeflater
19:11:57.326 INFO  FilterAlignmentArtifacts - Inflater: JdkInflater
19:11:57.326 INFO  FilterAlignmentArtifacts - GCS max retries/reopens: 20
19:11:57.326 INFO  FilterAlignmentArtifacts - Requester pays: disabled
19:11:57.326 WARN  FilterAlignmentArtifacts - 

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: FilterAlignmentArtifacts is an EXPERIMENTAL tool and should not be used for production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

19:11:57.326 INFO  FilterAlignmentArtifacts - Initializing engine
19:11:57.666 INFO  FeatureManager - Using codec VCFCodec to read file file:///output/sample.FilterMutectCalls.vcf.gz
19:11:57.757 INFO  FilterAlignmentArtifacts - Done initializing engine
19:11:57.827 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/app/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
19:11:57.861 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
19:11:57.862 INFO  IntelPairHmm - Available threads: 4
19:11:57.862 INFO  IntelPairHmm - Requested threads: 4
19:11:57.862 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
19:11:57.862 INFO  ProgressMeter - Starting traversal
19:11:57.862 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
*** glibc detected *** /for/bar/bin/java: double free or corruption (out): 0x00007f450af58700 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x3d01675dee)[0x7f45058afdee]
/lib64/libc.so.6(+0x3d01678c80)[0x7f45058b2c80]
/tmp/libgkl_smithwaterman410767516409374085.so(_Z19runSWOnePairBT_avx2iiiiPhS_iiaPcPs+0x338)[0x7f4499f4cfa8]
/tmp/libgkl_smithwaterman410767516409374085.so(Java_com_intel_gkl_smithwaterman_IntelSmithWaterman_alignNative+0xd8)[0x7f4499f4cbf8]
[0x7f44f58be6a2]
======= Memory map: ========

Then we disabled AVX2 in the newer cluster using Intels sde64 with -ivb, which directed GATK to use the Java implementation, and the filter worked without core dump.

sde64 -ivb -- faa.sh
Using GATK jar /app/gatk-package-4.1.8.0-local.jar
Running:
    /bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /app/gatk-package-4.1.8.0-local.jar FilterAlignmentArtifacts -V /output/sample.FilterMutectCalls.vcf.gz -R /db/hs37d5.fa --bwa-mem-index-image /ref/hg38.fa.img -I /output/sample.Mutect2.bam -O sample.somatic_filter2.test.vcf.gz --use-jdk-inflater true --use-jdk-deflater true
19:41:38.956 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/app/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
19:41:39.332 INFO  SmithWatermanAligner - AVX accelerated SmithWaterman implementation is not supported, falling back to the Java implementation

Hope this helps and we're looking forward the GKL fix.

Cheers, Richard

sahuno commented 4 years ago

Hello @lbergelson @mepowers

FilterAlignmentArtifacts task fails with error munmap_chunk(): invalid pointer when running mutect2 (WES Tumor-Normal) on Terra. I've included log file and runtime paramaters that i used just in case.

used both latest gatk version- 4.1.8.1 and previous version 4.1.8.0

runtime_params; gatk version- 4.1.8.1 { "boot_disk_size": 12, "command_mem": 15500, "cpu": 4, "disk": 310, "gatk_docker": "us.gcr.io/broad-gatk/gatk@sha256:8051adab0ff725e7e9c2af5997680346f3c3799b2df3785dd51d4abdd3da747b", "gatk_override": null, "machine_mem": 16000, "max_retries": 2, "preemptible": 2 }

runtime_params; gatk version- 4.1.8.0

{ "boot_disk_size": 12, "command_mem": 15500, "cpu": 4, "disk": 310, "gatk_docker": "us.gcr.io/broad-gatk/gatk:4.1.8.0", "gatk_override": null, "machine_mem": 16000, "max_retries": 2, "preemptible": 2 }

log file

2020/07/25 01:37:53 Starting container setup.
2020/07/25 01:37:55 Done container setup.
2020/07/25 01:37:56 Starting localization.
2020/07/25 01:38:02 Localization script execution started...
2020/07/25 01:38:02 Localizing input gs://gatk-test-data/mutect2/Homo_sapiens_assembly38.index_bundle -> /cromwell_root/gatk-test-data/mutect2/Homo_sapiens_assembly38.index_bundle
2020/07/25 01:38:40 Localizing input gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/script -> /cromwell_root/script
2020/07/25 01:38:45 Localization script execution complete.
2020/07/25 01:38:58 Done localization.
2020/07/25 01:38:59 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint= us.gcr.io/broad-gatk/gatk@sha256:8051adab0ff725e7e9c2af5997680346f3c3799b2df3785dd51d4abdd3da747b /bin/bash /cromwell_root/script
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.6c58e0ba
01:39:02.909 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
01:39:02.925 INFO  NativeLibraryLoader - Loading libgkl_smithwaterman.so from jar:file:/gatk/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_smithwaterman.so
01:39:02.927 INFO  SmithWatermanAligner - Using AVX accelerated SmithWaterman implementation
01:39:03.142 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
01:39:03.361 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
01:39:03.361 INFO  FilterAlignmentArtifacts - The Genome Analysis Toolkit (GATK) v4.1.8.1
01:39:03.361 INFO  FilterAlignmentArtifacts - For support and documentation go to https://software.broadinstitute.org/gatk/
01:39:03.362 INFO  FilterAlignmentArtifacts - Executing as root@3f245e278eba on Linux v4.19.112+ amd64
01:39:03.362 INFO  FilterAlignmentArtifacts - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
01:39:03.362 INFO  FilterAlignmentArtifacts - Start Date/Time: July 25, 2020 1:39:03 AM GMT
01:39:03.362 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
01:39:03.362 INFO  FilterAlignmentArtifacts - ------------------------------------------------------------
01:39:03.363 INFO  FilterAlignmentArtifacts - HTSJDK Version: 2.23.0
01:39:03.363 INFO  FilterAlignmentArtifacts - Picard Version: 2.22.8
01:39:03.363 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:39:03.363 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:39:03.363 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:39:03.363 INFO  FilterAlignmentArtifacts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:39:03.364 INFO  FilterAlignmentArtifacts - Deflater: IntelDeflater
01:39:03.364 INFO  FilterAlignmentArtifacts - Inflater: IntelInflater
01:39:03.364 INFO  FilterAlignmentArtifacts - GCS max retries/reopens: 20
01:39:03.364 INFO  FilterAlignmentArtifacts - Requester pays: disabled
01:39:03.364 WARN  FilterAlignmentArtifacts - 

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: FilterAlignmentArtifacts is an EXPERIMENTAL tool and should not be used for production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

01:39:03.364 INFO  FilterAlignmentArtifacts - Initializing engine
01:39:07.644 INFO  FeatureManager - Using codec VCFCodec to read file gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-Filter/22.hg38-filtered.vcf
01:39:08.399 INFO  FilterAlignmentArtifacts - Done initializing engine
01:39:09.523 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gatk/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
01:39:09.565 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
01:39:09.566 INFO  IntelPairHmm - Available threads: 4
01:39:09.566 INFO  IntelPairHmm - Requested threads: 4
01:39:09.566 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
01:39:09.567 INFO  ProgressMeter - Starting traversal
01:39:09.567 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
munmap_chunk(): invalid pointer
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx11500m -jar /root/gatk.jar FilterAlignmentArtifacts -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -V gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-Filter/22.hg38-filtered.vcf -I gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/209d1183-ed9a-4755-a4b3-d595797640ea/PreProcessingForVariantDiscovery_GATK4/9f7c0ab6-b61b-4797-92f1-7929bbf677d8/call-GatherBamFiles/22.hg38.bam --bwa-mem-index-image /cromwell_root/gatk-test-data/mutect2/Homo_sapiens_assembly38.index_bundle -O 22.hg38-filtered.vcf
2020/07/25 01:46:01 Starting delocalization.
2020/07/25 01:46:02 Delocalization script execution started...
2020/07/25 01:46:02 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/memory_retry_rc
2020/07/25 01:46:02 Delocalizing output /cromwell_root/rc -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/rc
2020/07/25 01:46:04 Delocalizing output /cromwell_root/stdout -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/stdout
2020/07/25 01:46:05 Delocalizing output /cromwell_root/stderr -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/stderr
2020/07/25 01:46:06 Delocalizing output /cromwell_root/22.hg38-filtered.vcf -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/22.hg38-filtered.vcf
2020/07/25 01:46:08 Delocalizing output /cromwell_root/22.hg38-filtered.vcf.idx -> gs://fc-ac4624cb-a8fc-49a2-b071-d3a0ae799418/cb1feccb-0a69-42bf-ba5f-fde762934a59/Mutect2/fe3623c8-0eaf-4cd4-9f81-d1fda4073f2e/call-FilterAlignmentArtifacts/attempt-3/22.hg38-filtered.vcf.idx
Required file output '/cromwell_root/22.hg38-filtered.vcf.idx' does not exist.

I will appreciate your help in resolving this.

Kind regards Sam

rpomaris commented 4 years ago

@sahuno Thanks for the detailed report here. As @lbergelson mentioned above, we're working on a new GKL release and will check if the issues above are resolved with the new release on Linux. We don't currently have plans to test specifically in Terra, but @lbergelson may be able to help pull in the right folks to do so once we get the new GKL release out.

gevro commented 3 years ago

I'm getting the same issue on GATK 4.1.9.0 FilterAlignmentArtifacts. This bug has been present for 1 year. Has this been fixed? Note: There is no work-around because FilterAlignmentArtifacts does not have a --smith-waterman option.

Here is my error:

20:12:42.724 WARN  FilterAlignmentArtifacts - 

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: FilterAlignmentArtifacts is an EXPERIMENTAL tool and should not be used for production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

20:12:42.725 INFO  FilterAlignmentArtifacts - Initializing engine
20:12:48.403 INFO  FeatureManager - Using codec VCFCodec to read file gs://fc-secure-024a1aae-a4f9-4025-aa93-f759f93a8203/50383670-4607-4e59-9bfc-4db970980f0e/Mutect2/773a91ea-25be-4d49-b97c-16527076250c/call-Filter/cacheCopy/TN-20-36-filtered.vcf
20:12:50.117 INFO  FilterAlignmentArtifacts - Done initializing engine
20:12:51.042 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
20:12:51.099 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
20:12:51.100 INFO  IntelPairHmm - Available threads: 14
20:12:51.100 INFO  IntelPairHmm - Requested threads: 4
20:12:51.100 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
20:12:51.100 INFO  ProgressMeter - Starting traversal
20:12:51.100 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
20:20:25.766 INFO  ProgressMeter -       chr3:104142090              7.6                  1000            132.0
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007efc9818177e, pid=24, tid=0x00007f13b3c76700
#
# JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 )
# Problematic frame:
# C  [libgkl_smithwaterman1809483713436863458.so+0x177e]  smithWatermanBackTrack(dnaSeqPair*, int, int, int, int, int*, int)+0x60e
#
# Core dump written. Default location: /cromwell_root/core or core.24
#
# An error report file with more information is saved as:
# /cromwell_root/hs_err_pid24.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
gevro commented 3 years ago

@sahuno and @flexray - have either of you found a work-around for this? FilterAlignmentArtifacts does not have a --smith-waterman option to switch to JAVA.

droazen commented 3 years ago

@gevro Could you try running with the latest GATK master branch, and report whether the error still occurs?

gevro commented 3 years ago

Is there a docker for this on gcr.io? Because I am using terra, so it requires a docker.

gevro commented 3 years ago

@droazen - note also that a simple fix would be to add --smith-waterman as an option for FilterAlignmentArtifacts. Right now it is hard-coded in FilterAlignmentArtifacts, but that would at least allow a work-around using --smith-waterman JAVA

gevro commented 3 years ago

Also, I suspect that for Terra another workaround might be to request a different machine type that does not support AVX:

https://cromwell.readthedocs.io/en/stable/RuntimeAttributes/

runtime {
    cpu: 2
    cpuPlatform: "Intel Skylake"
}

https://cloud.google.com/compute/docs/regions-zones/#available

Would appreciate any guidance on this in the meantime. Thanks!

gevro commented 3 years ago

I found a workaround for Terra. I just tried various cpuPlatform settings in the runtime parameters for the FilterAlignmentArtifacts task and for some reason this works: cpuPlatform: "Intel Sandy Bridge"

droazen commented 3 years ago

@gevro There is a docker image snapshot of the latest master available in the broadinstitute/gatk-nightly repository on dockerhub. The latest snapshot is broadinstitute/gatk-nightly:2021-02-18-4.1.9.0-68-gae06fb734-NIGHTLY-SNAPSHOT. Even though you found a workaround, it would be helpful for us to know whether this issue is resolved by the newer GKL version included in master.

I agree that the tool should allow the Java SmithWaterman implementation to be selected via an argument.

gevro commented 3 years ago

I will try to test it next time I have a chance. I don't have the pipeline and data up anymore on Terra. But the default Terra pipeline for Mutect2 FilterAlignmentArtifacts with the default Terra cpu platform should reproduce the issue on GATK 4.1.9.0.

droazen commented 3 years ago

@gevro To amend my previous comment: it was brought to my attention that the docker image snapshot I linked to above does not actually come with the newer GKL release that might fix your issue. Sorry for the miscommunication! We're working on building a test GATK image that does contain the newer GKL release, and once we have that I'll post a link to it here.

yaotianran commented 3 years ago

My current workaround is to downgrade GATK to 4.1.3.0

meganshand commented 3 years ago

@droazen Do you know if this issue has been resolved? Is the newer GKL release now included in GATK?

lbergelson commented 3 years ago

I have a new build of the GKL that I need to test and then integrate into a new gatk. It's not available yet though.

lbergelson commented 2 years ago

@gevro @yaottianran. I know this is old, but if you want to test with a recent GATK release that includes the upgraded intel gkl it has hopefully fixed this issue.

gevro commented 2 years ago

Thamks. It may be some time before I can try.

lbergelson commented 2 years ago

There is also a new argument in FilterAlignmentArtifacts --smith-waterman that lets you choose to use the java version or the faster but previously buggy accelerated version so there is no need to downgrade to an old gatk to work around now.