bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

spark error through docker #2638

Closed DiyaVaka closed 5 years ago

DiyaVaka commented 5 years ago

Hi Brad,

i tried to use your base image and create my own docker image. While doing so I am getting this error. Can you help

[2019-01-20T02:57Z] Calculating coverage: platinum sv_regions

[2019-01-20T02:58Z] Calculating coverage: platinum coverage

[2019-01-20T03:00Z] samtools stats : platinum

[2019-01-20T03:02Z] samtools index stats : platinum

[2019-01-20T03:02Z] Prepare BQSR tables with GATK: platinum

[2019-01-20T03:02Z] GATK: BaseRecalibratorSpark

[2019-01-20T03:02Z] Using GATK jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar

[2019-01-20T03:02Z] Running:

[2019-01-20T03:02Z] java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xms500m -Xmx45864m -Djava.io.tmpdir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf -jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar BaseRecalibratorSpark -I /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/align/platinum/platinum-sort.bam --spark-master local[16] --output /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf/platinum-sort-recal.grp --reference None --conf spark.driver.host=localhost --conf spark.network.timeout=800 --conf spark.local.dir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf --known-sites /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/inputs/data/genomes/GRCh37/variation/dbsnp_138.vcf.gz -L /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bedprep/cleaned-xgen-exome-research-panel-targets_6bpexpanded.bed --interval-set-rule INTERSECTION

[2019-01-20T03:02Z] 03:02:13.807 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly

[2019-01-20T03:02Z] 03:02:14.025 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

[2019-01-20T03:02Z] 03:02:14.236 INFO BaseRecalibratorSpark - ------------------------------------------------------------

[2019-01-20T03:02Z] 03:02:14.237 INFO BaseRecalibratorSpark - The Genome Analysis Toolkit (GATK) v4.0.2.1

[2019-01-20T03:02Z] 03:02:14.237 INFO BaseRecalibratorSpark - For support and documentation go to https://software.broadinstitute.org/gatk/

[2019-01-20T03:02Z] 03:02:14.237 INFO BaseRecalibratorSpark - Executing as root@c35737b5c33a on Linux v4.14.88-72.73.amzn1.x86_64 amd64

[2019-01-20T03:02Z] 03:02:14.238 INFO BaseRecalibratorSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_121-b15

[2019-01-20T03:02Z] 03:02:14.238 INFO BaseRecalibratorSpark - Start Date/Time: January 20, 2019 3:02:13 AM UTC

[2019-01-20T03:02Z] 03:02:14.238 INFO BaseRecalibratorSpark - ------------------------------------------------------------

[2019-01-20T03:02Z] 03:02:14.238 INFO BaseRecalibratorSpark - ------------------------------------------------------------

[2019-01-20T03:02Z] 03:02:14.238 INFO BaseRecalibratorSpark - HTSJDK Version: 2.14.3

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Picard Version: 2.17.2

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 1

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Deflater: IntelDeflater

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Inflater: IntelInflater

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - GCS max retries/reopens: 20

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes

[2019-01-20T03:02Z] 03:02:14.239 WARN BaseRecalibratorSpark -

[2019-01-20T03:02Z] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[2019-01-20T03:02Z] Warning: BaseRecalibratorSpark is a BETA tool and is not yet ready for use in production

[2019-01-20T03:02Z] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Initializing engine

[2019-01-20T03:02Z] 03:02:14.239 INFO BaseRecalibratorSpark - Done initializing engine

[2019-01-20T03:02Z] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SparkContext: Running Spark version 2.0.2

[2019-01-20T03:02Z] 19/01/20 03:02:14 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SecurityManager: Changing view acls to: root

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SecurityManager: Changing modify acls to: root

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SecurityManager: Changing view acls groups to:

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SecurityManager: Changing modify acls groups to:

[2019-01-20T03:02Z] 19/01/20 03:02:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO Utils: Successfully started service 'sparkDriver' on port 34655.

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkEnv: Registering MapOutputTracker

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkEnv: Registering BlockManagerMaster

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO DiskBlockManager: Created local directory at /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf/blockmgr-18c9b4a8-4b85-4f12-8410-681f8fced403

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO MemoryStore: MemoryStore started with capacity 23.7 GB

[2019-01-20T03:02Z] 19/01/20 03:02:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkEnv: Registering OutputCommitCoordinator

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO Utils: Successfully started service 'SparkUI' on port 4040.

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.2:4040

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO Executor: Starting executor ID driver on host localhost

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41089.

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO NettyBlockTransferService: Server created on localhost:41089

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 41089)

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:41089 with 23.7 GB RAM, BlockManagerId(driver, localhost, 41089)

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 41089)

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO MemoryStore: MemoryStore cleared

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO BlockManager: BlockManager stopped

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO BlockManagerMaster: BlockManagerMaster stopped

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO SparkContext: Successfully stopped SparkContext

[2019-01-20T03:02Z] 03:02:15.985 INFO BaseRecalibratorSpark - Shutting down engine

[2019-01-20T03:02Z] [January 20, 2019 3:02:15 AM UTC] org.broadinstitute.hellbender.tools.spark.BaseRecalibratorSpark done. Elapsed time: 0.03 minutes.

[2019-01-20T03:02Z] Runtime.totalMemory()=569901056

[2019-01-20T03:02Z] ***

[2019-01-20T03:02Z] A USER ERROR has occurred: Couldn't read the given reference, reference must be a .fasta or .2bit file.

[2019-01-20T03:02Z] Reference provided was: None

[2019-01-20T03:02Z] ***

[2019-01-20T03:02Z] Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO ShutdownHookManager: Shutdown hook called

[2019-01-20T03:02Z] 19/01/20 03:02:15 INFO ShutdownHookManager: Deleting directory /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf/spark-5f80c1d9-0db5-41d8-9831-3672ad0362f3

[2019-01-20T03:02Z] Uncaught exception occurred

Traceback (most recent call last):

File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run

_do_run(cmd, checks, log_stdout, env=env)

File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 103, in _do_run

raise subprocess.CalledProcessError(exitcode, error_msg)

CalledProcessError: Command 'set -o pipefail; export SPARK_USER=root && unset JAVA_HOME && export PATH=/usr/local/share/bcbio-nextgen/anaconda/bin:$PATH && gatk-launch --java-options '-Xms500m -Xmx45864m -Djava.io.tmpdir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf' BaseRecalibratorSpark -I /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/align/platinum/platinum-sort.bam --spark-master local[16] --output /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf/platinum-sort-recal.grp --reference None --conf spark.driver.host=localhost --conf spark.network.timeout=800 --conf spark.local.dir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf --known-sites /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/inputs/data/genomes/GRCh37/variation/dbsnp_138.vcf.gz -L /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bedprep/cleaned-xgen-exome-research-panel-targets_6bpexpanded.bed --interval-set-rule INTERSECTION

Using GATK jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar

Running:

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xms500m -Xmx45864m -Djava.io.tmpdir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf -jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar BaseRecalibratorSpark -I /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/align/platinum/platinum-sort.bam --spark-master local[16] --output /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf/platinum-sort-recal.grp --reference None --conf spark.driver.host=localhost --conf spark.network.timeout=800 --conf spark.local.dir=/mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bcbiotx/tmp6L8ojf --known-sites /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/inputs/data/genomes/GRCh37/variation/dbsnp_138.vcf.gz -L /mnt/scratch/d21a4131-e111-4644-9d0d-7f88561e326c/bedprep/cleaned-xgen-exome-research-panel-targets_6bpexpanded.bed --interval-set-rule INTERSECTION

03:02:13.807 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly

03:02:14.025 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0.2.1-0/gatk-package-4.0.2.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

roryk commented 5 years ago

Heya-- closing as this is pretty old and we haven't had any action on it.