Closed chapmanb closed 6 years ago
@gspowley @erniebrau @pnvaidya Could one of you please have a look at this issue?
@chapmanb We were able to reproduce a failure with your command line. This looks like an issue related to JNI and garbage collection that is exposed by setting -Xmx46965m
and -XX:+UseSerialGC
, but it needs further debugging.
To confirm, can you please try running without specifying these javaOptions? Something like this:
./gatk-launch --javaOptions '-Djava.io.tmpdir=$TEMP_DIR' \
ApplyBQSRSpark \
--sparkMaster local[16] \
--input $BAM_IN \
--output $BAM_OUT \
--bqsr_recal_file $BQSR_RECAL \
-- \
--conf spark.local.dir=$SPARK_LOCAL_DIR
FYI, we see better performance from Spark when using an SSD for spark.local.dir. The --conf
option above shows how to set the spark.local.dir.
George -- thanks much for debugging and identifying the underlying problem. I can confirm that we're able to avoid the error by removing -XX:+UseSerialGC
and moving back to parallel GC. We'd initially introduced the serial GC usage to avoid problems when running multiple HaplotypeCaller commands simultaneously on a single machine but by letting the Spark implementation take care of parallelizing we should no longer need to worry about that. Thanks again for the workaround and the tip on using spark.local.dir
. Much appreciated.
Thanks for the feedback Brad. We'll continue to look into the core dump to make sure it doesn't cause issues in the future.
This issue is related to https://github.com/Intel-HLS/GKL/issues/81
I'm assuming that the recent GKL update addresses this, so am closing based on the girl scout principle (find it broken? fix it), but feel free to reopen.
I'm running into a consistent core dump in GATK 4 beta 5 (GKL 0.5.8) related to deflation with the Intel Genomics Library. This occurs on a AWS m4.4xlarge machine running Ubuntu 16.04 and consistently core dumps and provides this stack trace:
https://gist.github.com/chapmanb/006c1c9abeb21e9baf244d17d7ae1003
Running ApplyBQSR:
Adding
--use_jdk_deflater
to the ApplyBQSR command line avoids the issue.I'm not sure if the java stack dump and command line provide enough information to be useful or if having a reproducible case is needed. The case above reproduces but has fairly large BAM files and I haven't been able to get a more minimal case, but I could prepare and share if it would be helpful. Thanks much for looking at this.