Closed amizeranschi closed 5 years ago
Sorry about the issue. CWL calling with GATK HaplotypeCaller uses GATK4's Spark parallelization to give faster multicore runs. Spark does require that hostnames on your machine are setup correctly and resolve. It looks like you might be trying on a laptop or other local machine where the local host does not resolve? If so, fixing this networking or running on a cloud or compute machine where it does will hopefully fix the issue. Hope this helps.
Thanks a lot for your reply and the pointers. Setting up the hostname in the /etc/hosts file definitely got things going, but now the result is a different error:
14:36:55.421 INFO HaplotypeCallerSpark - Requester pays: disabled
14:36:55.422 WARN HaplotypeCallerSpark -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: HaplotypeCallerSpark is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
14:36:55.422 INFO HaplotypeCallerSpark - Initializing engine
14:36:55.422 INFO HaplotypeCallerSpark - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/02/22 14:36:55 INFO SparkContext: Running Spark version 2.2.0
19/02/22 14:36:55 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
19/02/22 14:36:55 WARN Utils: Your hostname, asas resolves to a loopback address: 127.0.0.1; using 141.85.248.29 instead (on interface eth0)
19/02/22 14:36:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/02/22 14:36:56 INFO SparkContext: Submitted application: HaplotypeCallerSpark
19/02/22 14:36:56 INFO SecurityManager: Changing view acls to: amizeranschi
19/02/22 14:36:56 INFO SecurityManager: Changing modify acls to: amizeranschi
19/02/22 14:36:56 INFO SecurityManager: Changing view acls groups to:
19/02/22 14:36:56 INFO SecurityManager: Changing modify acls groups to:
19/02/22 14:36:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(amizeranschi); groups with view permissions: Set(); users with modify permissions: Set(amizeranschi); groups with modify permissions: Set()
19/02/22 14:36:56 INFO Utils: Successfully started service 'sparkDriver' on port 38922.
19/02/22 14:36:56 INFO SparkEnv: Registering MapOutputTracker
19/02/22 14:36:57 INFO SparkEnv: Registering BlockManagerMaster
19/02/22 14:36:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/02/22 14:36:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/02/22 14:36:57 INFO DiskBlockManager: Created local directory at /export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC-merged/work/cromwell_work/cromwell-executions/main-testingVC-merged.cwl/5c4d57ac-2e42-41ec-88fb-c803f96c7f27/call-variantcall/shard-0/wf-variantcall.cwl/fba01806-77f2-4033-a02b-1bd74632e72e/call-variantcall_batch_region/shard-0/execution/bcbiotx/tmpjNTNjr/blockmgr-3b149660-7113-4752-be93-9f38b0134711
19/02/22 14:36:57 INFO MemoryStore: MemoryStore started with capacity 20.8 GB
14:36:57.578 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/22 14:36:57 INFO SparkEnv: Registering OutputCommitCoordinator
19/02/22 14:36:58 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/02/22 14:36:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://localhost:4040
19/02/22 14:36:58 INFO Executor: Starting executor ID driver on host localhost
19/02/22 14:36:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39210.
19/02/22 14:36:58 INFO NettyBlockTransferService: Server created on localhost:39210
19/02/22 14:36:58 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/02/22 14:36:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManagerMasterEndpoint: Registering block manager localhost:39210 with 20.8 GB RAM, BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO Executor: Told to re-register on heartbeat
19/02/22 14:36:58 INFO BlockManager: BlockManager BlockManagerId(driver, localhost, 39210, None) re-registering with master
19/02/22 14:36:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManager: Reporting 0 blocks to the master.
19/02/22 14:36:58 INFO Executor: Told to re-register on heartbeat
19/02/22 14:36:58 INFO BlockManager: BlockManager BlockManagerId(driver, localhost, 39210, None) re-registering with master
19/02/22 14:36:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManager: Reporting 0 blocks to the master.
19/02/22 14:36:58 INFO Executor: Told to re-register on heartbeat
19/02/22 14:36:58 INFO BlockManager: BlockManager BlockManagerId(driver, localhost, 39210, None) re-registering with master
19/02/22 14:36:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 39210, None)
19/02/22 14:36:58 INFO BlockManager: Reporting 0 blocks to the master.
19/02/22 14:36:59 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
19/02/22 14:36:59 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/02/22 14:36:59 INFO MemoryStore: MemoryStore cleared
19/02/22 14:36:59 INFO BlockManager: BlockManager stopped
19/02/22 14:36:59 INFO BlockManagerMaster: BlockManagerMaster stopped
19/02/22 14:36:59 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/02/22 14:36:59 INFO SparkContext: Successfully stopped SparkContext
14:36:59.485 INFO HaplotypeCallerSpark - Shutting down engine
[February 22, 2019 2:36:59 PM EET] org.broadinstitute.hellbender.tools.HaplotypeCallerSpark done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=1240465408
***********************************************************************
A USER ERROR has occurred: Sorry, we only support a single reads input for for this spark tool.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
19/02/22 14:36:59 INFO ShutdownHookManager: Shutdown hook called
19/02/22 14:36:59 INFO ShutdownHookManager: Deleting directory /export/home/ncit/external/a.mizeranschi/automated-VC-test/testingVC-merged/work/cromwell_work/cromwell-executions/main-testingVC-merged.cwl/5c4d57ac-2e42-41ec-88fb-c803f96c7f27/call-variantcall/shard-0/wf-variantcall.cwl/fba01806-77f2-4033-a02b-1bd74632e72e/call-variantcall_batch_region/shard-0/execution/bcbiotx/tmpjNTNjr/spark-12057323-1c5f-42f3-a098-d5d508040ec5
' returned non-zero exit status 2
___________________________________________________________________________
[2019-02-22 14:37:04,69] [info] WorkflowManagerActor WorkflowActor-5c4d57ac-2e42-41ec-88fb-c803f96c7f27 is in a terminal state: WorkflowFailedState
What should be done with paired-end reads in this scenario (CWL with pooled VC)?
I've also tested with
variantcaller:
- strelka2
- freebayes
- samtools
and
ensemble:
numpass: 2
and I get
chr7 141972804 rs10246939 T C 156 PASS AC=3;AN=4;BQB=0.676634;CALLERS=samtools,freebayes,strelka2;DP=16;DP4=4,2,7,2;HOB=0.125;ICB=0.3;MQ=60;MQ0F=0;MQB=1;MQSB=1;RPB=0.569783;SGB=-4.72062;VDB=0.872557 GT:PL:DP:AD 1/1:55,6,0:2:0,2 0/1:134,0,133:13:6,7
chr7 141972905 rs1726866 G A 115 PASS AC=1;AN=4;BQB=0.687289;CALLERS=samtools,freebayes,strelka2;DP=22;DP4=2,12,2,4;HOB=0.125;ICB=0.3;MQ=60;MQ0F=0;MQB=1;MQSB=1;RPB=0.418638;SGB=2.63044;VDB=0.12166 GT:PL:DP:AD 0/0:0,12,155:4:4,0 0/1:149,0,196:16:10,6
chr7 141973545 rs713598 C G 179 PASS AC=3;AN=4;BQB=0.92664;CALLERS=samtools,freebayes,strelka2;DP=17;DP4=5,5,4,3;HOB=0.125;ICB=0.3;MQ=60;MQ0F=0;MQB=1;MQSB=1;RPB=0.110671;SGB=-4.10598;VDB=0.125342 GT:PL:DP:AD 1/1:113,9,0:3:0,3 0/1:100,0,245:14:10,4
which is really great. It's a real shame that GATK4 isn't working as well.
Sorry for the continued problems. I looked into this more and remembered that we switched off using HaplotypeCallerSpark as of release 1.1.1:
https://github.com/bcbio/bcbio-nextgen/commit/d33b6f4dc4aa1add37ff40d47c514aae7850fc18
and should currently run this multicore by region rather than trying to run the Spark approach. I know when we've been talking about up to date versions but not sure if you're using a different install or older Docker container here? Hopefully a recent one will avoid using Spark altogether and get your analysis run.
I'm not using a different install or any Docker container. Just a plain old local bcbio_nextgen (see: https://github.com/bcbio/bcbio-nextgen/issues/2688#issuecomment-466341358) with CWL (Cromwell).
Note, the error I mentioned above only happens when running with CWL. When running bcbio_nextgen directly, HaplotypeCaller works fine and HaplotypeCallerSpark appears to be switched off.
Thanks much for following up and sorry for missing the underlying issue. I believe this only happens when doing CWL pooled calling and pushed a fix which should correctly avoid spark usage when pooling. If you update with:
bcbio_nextgen.py upgrade -u development
and re-run in place it should hopefully do the right thing now. Please let us know if you run into any other issues.
Hello,
As recommended in https://github.com/bcbio/bcbio-nextgen/issues/2689#issuecomment-465955157, I tried using GATK4 with CWL (Cromwell) for calling variants, using the population calling method (as opposed to joint calling).
The sample config file and the error that I get are below. It seems like the error (
java.net.UnknownHostException: asas: asas: Name or service not known
) is mentioning my local hostname (asas
), for some reason.Other input files that I used can be found in this archive: https://github.com/bcbio/bcbio-nextgen/files/2884474/platypus-error.tar.gz
These are the commands used to set up and run the analysis:
Note: I'm using a local bcbio install and bcbio_vm.py was installed with
bcbio_nextgen.py upgrade -u development --data --tools --cwl
.This is the sample config file:
And the error is the following: