Closed gmgitx closed 5 years ago
Did you use examples/local/ENCSR356KRQ_subsampled.json
as your input JSON?
yes, I tried, showed error report, so I wonder maybe since it is json format for "local". Here is what it shows:
[2018-09-11 04:49:33,13] [info] Running with database db.url = jdbc:hsqldb:mem:58141743-77d8-458d-8454-0b7f4293431d;shutdown=false;hsqldb.tx=mvcc
[2018-09-11 04:49:43,58] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-11 04:49:43,60] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-11 04:49:43,73] [info] Running with database db.url = jdbc:hsqldb:mem:be828468-81d2-4957-8563-09c6b6c058d3;shutdown=false;hsqldb.tx=mvcc
[2018-09-11 04:49:44,09] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActo
rFactory for PAPI v2
[2018-09-11 04:49:44,13] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-11 04:49:44,14] [info] Using noop to send events.
[2018-09-11 04:49:44,46] [info] Slf4jLogger started
[2018-09-11 04:49:44,71] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-84c9232",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
[2018-09-11 04:49:44,76] [info] Metadata summary refreshing every 2 seconds.
[2018-09-11 04:49:44,82] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-11 04:49:44,82] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-11 04:49:44,82] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-11 04:49:45,95] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,98] [info] PAPIQueryManager Running with 3 workers
[2018-09-11 04:49:45,98] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-11 04:49:45,98] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-11 04:49:46,04] [info] Unspecified type (Unspecified version) workflow 8823cc11-5e71-4004-b8d4-edf40eb38cd6 submitted
[2018-09-11 04:49:46,09] [info] SingleWorkflowRunnerActor: Workflow submitted 8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,13] [info] 1 new workflows fetched
[2018-09-11 04:49:46,13] [info] WorkflowManagerActor Starting workflow 8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,14] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-11 04:49:46,14] [info] WorkflowManagerActor Successfully started WorkflowActor-8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,14] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-11 04:49:46,16] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-11 04:49:46,21] [info] MaterializeWorkflowDescriptorActor [8823cc11]: Parsing workflow as WDL draft-2
[2018-09-11 21:59:21,27] [info] MaterializeWorkflowDescriptorActor [2a0d0464]: Call-to-Backend assignments: atac.pool_ta_pr2 -> slurm, atac.filter -> slurm, atac.macs2_pr2 -> slurm, atac.overlap_pr -> slurm, atac.bowtie2 -> slurm, atac.$
ool_ta_pr1 -> slurm, atac.ataqc -> slurm, atac.reproducibility_overlap -> slurm, atac.overlap_ppr -> slurm, atac.spr -> slurm, atac.idr_ppr -> slurm, atac.xcor -> slurm, atac.qc_report -> slurm, atac.idr_pr -> slurm, atac.overlap -> slu$
m, atac.macs2_ppr2 -> slurm, atac.macs2_pr1 -> slurm, atac.macs2_ppr1 -> slurm, atac.read_genome_tsv -> slurm, atac.macs2_pooled -> slurm, atac.reproducibility_idr -> slurm, atac.pool_ta -> slurm, atac.macs2 -> slurm, atac.bam2ta -> slur
m, atac.idr -> slurm, atac.trim_adapter -> slurm
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Starting atac.read_genome_tsv
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: 'enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!align_only && !true_rep_only && enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: 'enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!disable_xcor'. Running conditional section
[2018-09-11 21:59:23,67] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!true_rep_only'. Running conditional section
[2018-09-11 21:59:23,67] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!align_only && !true_rep_only'. Running conditional section
[2018-09-11 21:59:25,00] [warn] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks
[2018-09-11 21:59:25,43] [info] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: cat /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_geno
me_tsv/inputs/378634365/hg19.tsv
[2018-09-11 21:59:25,49] [info] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: executing: sbatch \
--export=ALL \
-J cromwell_2a0d0464_read_genome_tsv \
-D /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv \
-o /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stdout \
-e /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr \
-t 60 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
--mem=4000 \
\
--account mengguo \
--wrap "/bin/bash /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/script"
[2018-09-11 21:59:26,04] [error] WorkflowManagerActor Workflow 2a0d0464-f14a-4c4d-a994-cf084e712a66 failed (during ExecutingWorkflowState): java.lang.RuntimeException: Unable to start job. Check the stderr file for possible errors: /proj
ect2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr.submit
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:131)
at scala.util.Either.fold(Either.scala:188)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
at scala.util.Try$.apply(Try.scala:209)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[2018-09-11 21:59:26,04] [info] WorkflowManagerActor WorkflowActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 is in a terminal state: WorkflowFailedState
[2018-09-11 21:59:35,67] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-11 21:59:36,55] [info] Workflow polling stopped
[2018-09-11 21:59:36,56] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-09-11 21:59:36,56] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-09-11 21:59:36,57] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-09-11 21:59:36,57] [info] JobExecutionTokenDispenser stopped
[2018-09-11 21:59:36,57] [info] Aborting all running workflows.
[2018-09-11 21:59:36,57] [info] WorkflowLogCopyRouter stopped
[2018-09-11 21:59:36,57] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-09-11 21:59:36,57] [info] WorkflowStoreActor stopped
[2018-09-11 21:59:36,57] [info] WorkflowManagerActor All workflows finished
[2018-09-11 21:59:36,57] [info] WorkflowManagerActor stopped
[2018-09-11 21:59:36,57] [info] Connection pools shut down
[2018-09-11 21:59:36,57] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] SubWorkflowStoreActor stopped
[2018-09-11 21:59:36,57] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,57] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] JobStoreActor stopped
[2018-09-11 21:59:36,57] [info] CallCacheWriteActor stopped
[2018-09-11 21:59:36,57] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,58] [info] DockerHashActor stopped
[2018-09-11 21:59:36,58] [info] IoProxy stopped
[2018-09-11 21:59:36,58] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,58] [info] ServiceRegistryActor stopped
[2018-09-11 21:59:36,60] [info] Database closed
[2018-09-11 21:59:36,60] [info] Stream materializer shut down
Workflow 2a0d0464-f14a-4c4d-a994-cf084e712a66 transitioned to state Failed
[2018-09-11 21:59:36,64] [info] Automatic shutdown of the async connection
[2018-09-11 21:59:36,64] [info] Gracefully shutdown sentry threads.
[2018-09-11 21:59:36,65] [info] Shutdown finished.
I never change -Dconfig.file=backends/backend.conf -Dbackend.default=slurm
, I don't know it should be adjusted?
"local" here means running pipelines with downloaded (so locally existing) files.
This looks like a SLURM problem. Does your SLURM sbatch take --account
or --partition
?
Please post an example sbatch command or shell script template you use for submitting your own job to SLURM.
Also, can you run the following sbatch command and see what happens. Post any errors here. I will take a look.
sbatch \
--export=ALL \
-J cromwell_2a0d0464_read_genome_tsv \
-D /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv \
-o /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stdout \
-e /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr \
-t 60 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
--mem=4000 \
\
--account mengguo \
--wrap "/bin/bash /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/script"
BTW, I got your email but I cannot personally skype with you.
I see, thanks!
I just put this command line in shell as doc step8, I run in/mypath/atac-seq-pipeline/
and source activate encode-atac-seq-pipeline:
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/mengguo/local/bin/cromwell-34.jar run atac.wdl -i /mypath1/input.json -o /mypath2/atac-seq-pipeline/workflow_opts/slurm.json
For "run the following sbatch command and see what happens", it happens
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
(encode-atac-seq-pipeline) Tue Sep 11 18:58:59 2018
slurm.json: { "default_runtime_attributes" : { "slurm_account": "mengguo" } }
@gmgitx: Your error says Invalid account or account/partition combination specified
. Please post an example sbatch command or shell script template you use for submitting your own job to SLURM.
I got right account/partition from our IT after your mentioned, no error like that. But still can't work, for the document provide ENCSR356KRQ data.
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/mengguo/local/bin/cromwell-34.jar run atac.wdl -i /mypath1/ENCSR356KRQ_subsampled.json -o /mypath2/atac-seq-pipeline/workflow_opts/slurm.json
this is the command I put in, I didn't add sbatch
.
Part of warn and error:
[2018-09-14 17:11:22,17] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory fo$
PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-14 17:11:22,17] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
...
[2018-09-14 17:11:23,73] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
...
[2018-09-14 17:13:09,52] [info] MaterializeWorkflowDescriptorActor [00851093]: Call-to-Backend assignments: atac.overlap_pr -> slurm, atac.spr -> sl[42/1857$
.qc_report -> slurm, atac.reproducibility_idr -> slurm, atac.reproducibility_overlap -> slurm, atac.pool_ta -> slurm, atac.macs2_pr2 -> slurm, atac.xcor -> s
lurm, atac.ataqc -> slurm, atac.overlap_ppr -> slurm, atac.filter -> slurm, atac.idr_ppr -> slurm, atac.idr_pr -> slurm, atac.bam2ta -> slurm, atac.overlap -
> slurm, atac.bowtie2 -> slurm, atac.macs2_ppr1 -> slurm, atac.pool_ta_pr2 -> slurm, atac.read_genome_tsv -> slurm, atac.trim_adapter -> slurm, atac.macs2_pp
r2 -> slurm, atac.macs2_pr1 -> slurm, atac.idr -> slurm, atac.pool_ta_pr1 -> slurm, atac.macs2 -> slurm, atac.macs2_pooled -> slurm
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job
executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
...
[warn] DispatchedConfigAsyncJobExecutionActor [e0efd905atac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks
...
[2018-09-14 20:35:09,32] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708413/ENCFF193RRC
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,32] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,32] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708414/ENCFF886FS$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708413/ENCFF366DF$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708414/ENCFF573UX$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz
[2018-09-14 20:35:09,34] [error] DispatchedConfigAsyncJobExecutionActor [018d7f41atac.trim_adapter:1:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
...
[2018-09-14 17:00:23,88] [error] DispatchedConfigAsyncJobExecutionActor [2386aadbatac.trim_adapter:1:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:537)
at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:472)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.instantiatedCommand$lzycompute(ConfigAsyncJobExecutionActor.scala:208)
at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.instantiatedCommand(ConfigAsyncJobExecutionActor.scala:208)
I didn't mean a pipeline command line. I just wanted to see an example sbatch command line that you usually use. Is there an wiki page for your cluster?
What is your sbatch command line to submit the following HelloWorld shell script hello_world.sh
?
#!/bin/bash
echo Hello world
echo Sleep 60
Sorry for my misunderstanding.
guide for our cluster https://github.com/jdblischak/giladlab-midway-guide
Here sbatch command:
sbatch hello_world.sh
then back me a file slurm-[number].out
Are you sure that sbatch hello_world.sh
works without any extra parameters? If so, remove account settings ("slurm_account": "mengguo"
) from workflow_opts/slurm.json
and try again.
Yes, in the slurm-[number].out,
Hello world
Sleep 60
I removed account, seems same warning and error report.
Please post a full log and also your workflow_opts/slurm.json
.
sbatch --mem=8g --partition=broadwl run_atac.sh
#!/bin/bash
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/name2/local/bin/cromwell-34.jar run atac.wdl -i /project2/name1/name2/DLDS/ENCSR356KRQ_subsampled.json -o /project2/name1/name2/ASP/atac-seq-pipeline/workflow_opts/slurm.json
{
"default_runtime_attributes" : {
"slurm_partition": "broadwl"
}
}
#########################result
I guess that you (or your partition) have a limited quota for resources on your cluster?
$ scontrol show partition broadwl
Do you have a privilege to use enough resources (memory>=16GB, cpu>=4, walltime>=48hr per task) on your partition?
Please run the following on the working directory where you ran the pipeline. This will make a tar ball of all log files and please upload it here. I need it for debugging:
$ find . -type f -name 'stdout' -or -name 'stderr' -or -name 'script' -or \
-name '*.qc' -or -name '*.txt' -or -name '*.log' -or -name '*.png' -or -name '*.pdf' \
| xargs tar -zcvf debug_issue_31.tar.gz
Thanks!
scontrol show partition broadwl
PartitionName=broadwl
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=midway2-[0002-0089,0103-0124,0137-0182,0221-0230,0258-0280,0282-0301,0312-0398,0400]
PriorityJobFactor=20 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=8316 TotalNodes=297 SelectTypeParameters=NONE
DefMemPerCPU=2048 MaxMemPerCPU=9223372036854775807
@gmgitx Your tarball does not have any file in it.
Thanks! After my executed that command on the working directory where I ran the pipeline, one debug_issue_31.tar.gz left. So it means what if no file in it or it should have?
Please send that file debug_issue_31.tar.gz
to my email.
I sent
I got your log but it includes outputs from too many pipeline runs. But for the latest pipeline run, I found that the first task of the pipeline worked fine so you can keep using your partition broadwl
. But the next step failed and I need to figure it out. I guess that it's rejected by a cluster due to resource quota.
What is resource quota on your cluster? How much resource your partition can use on your cluster? For example, maximum number of concurrent jobs, max cpu per job, max memory per job, max walltime per job. This information will be helpful for debugging it.
Can you clean up (rm -rf cromwell-execution*
) your output directories and run a pipeline again? Or if that rm -rf
does not work then make a new directory and follow steps on the documentation again. And then post both your screen log and a new tar ball (please make a new one using the same command).
Many thanks! I sent debug_issue_31.tar.gz to your email, and slurm-49925402.out
According what i know from IT, memory>=16GB, cpu>=4 are allowed but walltime must be under 36 hours total
my partition: MaxCPUsPerUser 2800 MaxNodesPerUser 100 MaxJobsPerUser 100 MaxSubmitJobs 500 MaxWall 1-12:00:00
Default walltime for bowtie2 is 48 hours. I think this caused the problem. Please add the following to your input JSON and try again.
"atac.bowtie2.mem_mb" : 10000,
"atac.bowtie2.cpu" : 1,
"atac.bowtie2.time_hr" : 12,
Also, reduce the number of concurrent jobs to 1 or 2 in backends/backend.conf
.
https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/backends/backend.conf#L164
Thanks your kindly help.
This fold "cromwell-executions" not be created after ran, I modified as your advice this time and by,
###
sbatch ./example.sbatch
#example.sbatch
#!/bin/bash
#SBATCH --job-name=example_sbatch
#SBATCH --output=example_sbatch.out
#SBATCH --error=example_sbatch.err
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem-per-cpu=20
source activate encode-atac-seq-pipeline
bash run_atac1.sh
source deactivate
###run_atac1.sh
#!/bin/bash
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/name2/local/bin/cromwell-34.jar run atac.wdl -i /project2/name1/name2/DLDS/ENCSR356KRQ_subsampled.json -o /project2/name1/name2/ASP/atac-seq-pipeline/workflow_opts/slurm.json
Sorry for its error. Also, I sent debug_issue_31.tar.gz to your email
However, thank you again.
Your sbatch_report is trimmed?
I combined two files (example_sbatch.err and example_sbatch.out) together, no other process
Please take a look at the ###example_sbatch.out
part.
A log file in your tar ball says that some of the sub-tasks (read_genome_tsv
, trim_adapter
) were done successfully but they are not shown on your sbatch_report
. I think it's trimmed indeed. It only shows
some initialization stages of the pipeline.
yes, you are right, here is a example_sbatch.out. But I think if sub-tasks were done successfully, fold "cromwell-executions" should be created and here not
###example_sbatch.out
[2018-09-25 18:19:47,44] [info] Running with database db.url = jdbc:hsqldb:mem:47a741e8-0324-4c29-a170-f4dd54d61b24;shutdown=false;hsqldb.tx=mvcc
[2018-09-25 18:19:55,46] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-25 18:19:55,47] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-25 18:19:55,56] [info] Running with database db.url = jdbc:hsqldb:mem:124b08c3-9730-416e-a798-3cf91acbf493;shutdown=false;hsqldb.tx=mvcc
[2018-09-25 18:19:55,88] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-25 18:19:55,92] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-25 18:19:55,92] [info] Using noop to send events.
[2018-09-25 18:19:56,19] [info] Slf4jLogger started
[2018-09-25 18:19:56,37] [info] Workflow heartbeat configuration:
{
"cromwellId" : "cromid-a9ac2b1",
"heartbeatInterval" : "2 minutes",
"ttl" : "10 minutes",
"writeBatchSize" : 10000,
"writeThreshold" : 10000
}
[2018-09-25 18:19:56,40] [info] Metadata summary refreshing every 2 seconds.
[2018-09-25 18:19:56,43] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-25 18:19:56,43] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-25 18:19:56,43] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-25 18:19:57,18] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-25 18:19:57,20] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] PAPIQueryManager Running with 3 workers
[2018-09-25 18:19:57,21] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-25 18:19:57,25] [info] Unspecified type (Unspecified version) workflow 40567000-f7d2-491b-b255-44cdcec9a54b submitted
[2018-09-25 18:19:57,30] [info] SingleWorkflowRunnerActor: Workflow submitted 40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,31] [info] 1 new workflows fetched
[2018-09-25 18:19:57,31] [info] WorkflowManagerActor Starting workflow 40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,31] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-25 18:19:57,31] [info] WorkflowManagerActor Successfully started WorkflowActor-40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,32] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-25 18:19:57,32] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-25 18:19:57,37] [info] MaterializeWorkflowDescriptorActor [40567000]: Parsing workflow as WDL draft-2
Can you upload your modified input JSON here?
Sure, thanks ####.../ENCSR356KRQ_subsampled.json
{
"atac.pipeline_type" : "atac",
"atac.genome_tsv" : "/project2/name1/name2/DLDS/process_data/hg19db/hg19.tsv",
"atac.fastqs" : [
[
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair1/ENCFF341MYG.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair2/ENCFF248EJF.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair1/ENCFF106QGY.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair2/ENCFF368TYI.subsampled.400.fastq.gz"]
],
[
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF641SFZ.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF031ARQ.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF751XTV.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF590SYZ.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF927LSG.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF734PEQ.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF859BDM.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF007USV.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz"],
["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz",
"/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz"]
]
],
"atac.paired_end" : true,
"atac.multimapping" : 4,
"atac.trim_adapter.auto_detect_adapter" : true,
"atac.trim_adapter.cpu" : 1,
"atac.bowtie2.mem_mb" : 10000,
"atac.bowtie2.cpu" : 1,
"atac.bowtie2.mem_hr" : 12,
"atac.filter.cpu" : 1,
"atac.filter.mem_mb" : 12000,
"atac.macs2_mem_mb" : 16000,
"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,
"atac.qc_report.name" : "ENCSR356KRQ (subsampled 1/400 reads)",
"atac.qc_report.desc" : "ATAC-seq on primary keratinocytes in day 0.0 of differentiation"
}
####...atac-seq-pipeline/workflow_opts/slurm.json
{
"default_runtime_attributes" : {
"slurm_partition": "broadwl"
}
}
I think this is a resource quota/limit problem on your cluster. Please play with some resource settings in your input JSON. You may need to revert back to the last partially successful configuration (for some tasks) somehow and change settings for resources.
https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md#resource
Resource settings for one of your successful task (trim_adapter
) was 2 cpu, 12000 mem_mb, 24 time_hr.
Thanks! Although just have trim result, I'll continue to adjust resource settings. But for run out trim_adapter
.../atac-seq-pipeline/cromwell-executions/atac/06bf6b3b-164f-4917-9507-d90a58a428e4/call-trim_adapter/shard-0/execution
merge_fastqs_R1_ENCFF341MYG.subsampled.400.trim.merged.fastq.gz
merge_fastqs_R2_ENCFF248EJF.subsampled.400.trim.merged.fastq.gz
.../atac-seq-pipeline/cromwell-executions/atac/06bf6b3b-164f-4917-9507-d90a58a428e4/call-trim_adapter/shard-1/execution
merge_fastqs_R1_ENCFF641SFZ.subsampled.400.trim.merged.fastq.gz
merge_fastqs_R2_ENCFF031ARQ.subsampled.400.trim.merged.fastq.gz
Could I make sure it's right or something wrong for just get first two files's trim result for each rep of ENCSR356KRQ?
Yes, these fastqs (two for each replicate) look fine.
Closing this issue due to long inactivity.
Hi, thanks your wonderful work. I run in
/mypath/atac-seq-pipeline/
andsource activate encode-atac-seq-pipeline
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /my_path/local/bin/cromwell-34.jar run atac.wdl -i /my_path1/input.json -o /my_path2/atac-seq-pipeline/workflow_opts/slurm.json
But just one file named "cromwell-workflow-logs" left but nothing in it
Jenkinsfile LICENSE README.md atac.wdl backends conda **cromwell-workflow-logs** docker_image docs examples genome src test workflow_opts
What's more, when it was running, it shows the following on the screen:I follow https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/tutorial_slurm.md, since I should run on my shool's slurm not my local pc but not stanford university‘s slurm.
Would you have any advice about my two errors?