ENCODE-DCC / atac-seq-pipeline

ENCODE ATAC-seq pipeline
MIT License
385 stars 172 forks source link

ATAC pipeline run on slurm report error #31

Closed gmgitx closed 5 years ago

gmgitx commented 6 years ago

Hi, thanks your wonderful work. I run in/mypath/atac-seq-pipeline/ and source activate encode-atac-seq-pipeline

java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /my_path/local/bin/cromwell-34.jar run atac.wdl -i /my_path1/input.json -o /my_path2/atac-seq-pipeline/workflow_opts/slurm.json

But just one file named "cromwell-workflow-logs" left but nothing in it Jenkinsfile LICENSE README.md atac.wdl backends conda **cromwell-workflow-logs** docker_image docs examples genome src test workflow_opts What's more, when it was running, it shows the following on the screen:

[2018-09-08 09:23:52,43] [info] Running with database db.url = jdbc:hsqldb:mem:a42fb754-58fc-418e-8224-01cd57b5b131;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:01,66] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-08 09:24:01,67] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-08 09:24:01,78] [info] Running with database db.url = jdbc:hsqldb:mem:8c25714f-6a58-4b03-bf8d-b686ee8442fc;shutdown=false;hsqldb.tx=mvcc
[2018-09-08 09:24:02,13] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for
PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-08 09:24:02,16] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-08 09:24:02,16] [info] Using noop to send events.
[2018-09-08 09:24:02,44] [info] Slf4jLogger started
[2018-09-08 09:24:02,66] [info] Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-d9e2d67",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
[2018-09-08 09:24:02,69] [info] Metadata summary refreshing every 2 seconds.
[2018-09-08 09:24:02,72] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-08 09:24:02,72] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-08 09:24:03,69] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-08 09:24:03,71] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] JES batch polling interval is 33333 milliseconds
[2018-09-08 09:24:03,71] [info] PAPIQueryManager Running with 3 workers
[2018-09-08 09:24:03,72] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-08 09:24:03,77] [info] Unspecified type (Unspecified version) workflow 1e03bf36-d64b-42a7-9857-a644de257de3 submitted
[2018-09-08 09:24:03,82] [info] SingleWorkflowRunnerActor: Workflow submitted 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,82] [info] 1 new workflows fetched
[2018-09-08 09:24:03,82] [info] WorkflowManagerActor Starting workflow 1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-08 09:24:03,83] [info] WorkflowManagerActor Successfully started WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3
[2018-09-08 09:24:03,83] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-08 09:24:03,85] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-08 09:24:03,89] [info] MaterializeWorkflowDescriptorActor [1e03bf36]: Parsing workflow as WDL draft-2
[2018-09-08 09:24:22,52] [error] WorkflowManagerActor Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 failed (during MaterializingWorkflowDescriptorState): cro
mwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
Unexpected character ']' at input index 643 (line 13, position 5), expected JSON Value:
    ],
    ^

        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$materialization$Materiali
zeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:200)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:170)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.sc
ala:165)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at akka.actor.FSM.processEvent(FSM.scala:670)
        at akka.actor.FSM.processEvent$(FSM.scala:667)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.akka$actor$LoggingFSM$$super$processEvent(MaterializeWorkflo
wDescriptorActor.scala:123)
        at akka.actor.LoggingFSM.processEvent(FSM.scala:806)
        at akka.actor.LoggingFSM.processEvent$(FSM.scala:788)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.processEvent(MaterializeWorkflowDescriptorActor.scala:123)
        at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:664)
        at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:658)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:123)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)                                                            [6/1501]

[2018-09-08 09:24:22,52] [info] WorkflowManagerActor WorkflowActor-1e03bf36-d64b-42a7-9857-a644de257de3 is in a terminal state: WorkflowFailedState
[2018-09-08 09:24:25,09] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-08 09:24:27,74] [info] Workflow polling stopped
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-09-08 09:24:27,76] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-09-08 09:24:27,77] [info] Aborting all running workflows.
[2018-09-08 09:24:27,77] [info] JobExecutionTokenDispenser stopped
[2018-09-08 09:24:27,77] [info] WorkflowStoreActor stopped
[2018-09-08 09:24:27,78] [info] WorkflowLogCopyRouter stopped
[2018-09-08 09:24:27,78] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor All workflows finished
[2018-09-08 09:24:27,78] [info] WorkflowManagerActor stopped
[2018-09-08 09:24:27,78] [info] Connection pools shut down
[2018-09-08 09:24:27,78] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] SubWorkflowStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] JobStoreActor stopped
[2018-09-08 09:24:27,79] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] CallCacheWriteActor stopped
[2018-09-08 09:24:27,79] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-09-08 09:24:27,79] [info] DockerHashActor stopped
[2018-09-08 09:24:27,79] [info] IoProxy stopped
[2018-09-08 09:24:27,79] [info] ServiceRegistryActor stopped
[2018-09-08 09:24:27,81] [info] Database closed
[2018-09-08 09:24:27,81] [info] Stream materializer shut down
Workflow 1e03bf36-d64b-42a7-9857-a644de257de3 transitioned to state Failed
[2018-09-08 09:24:27,85] [info] Automatic shutdown of the async connection
[2018-09-08 09:24:27,85] [info] Gracefully shutdown sentry threads.
[2018-09-08 09:24:27,85] [info] Shutdown finished.

I follow https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/tutorial_slurm.md, since I should run on my shool's slurm not my local pc but not stanford university‘s slurm.

Would you have any advice about my two errors?

leepc12 commented 6 years ago

Did you use examples/local/ENCSR356KRQ_subsampled.json as your input JSON?

gmgitx commented 6 years ago

yes, I tried, showed error report, so I wonder maybe since it is json format for "local". Here is what it shows:

[2018-09-11 04:49:33,13] [info] Running with database db.url = jdbc:hsqldb:mem:58141743-77d8-458d-8454-0b7f4293431d;shutdown=false;hsqldb.tx=mvcc
[2018-09-11 04:49:43,58] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-11 04:49:43,60] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-11 04:49:43,73] [info] Running with database db.url = jdbc:hsqldb:mem:be828468-81d2-4957-8563-09c6b6c058d3;shutdown=false;hsqldb.tx=mvcc
[2018-09-11 04:49:44,09] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActo
rFactory for PAPI v2
[2018-09-11 04:49:44,13] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-11 04:49:44,14] [info] Using noop to send events.
[2018-09-11 04:49:44,46] [info] Slf4jLogger started
[2018-09-11 04:49:44,71] [info] Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-84c9232",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
[2018-09-11 04:49:44,76] [info] Metadata summary refreshing every 2 seconds.
[2018-09-11 04:49:44,82] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-11 04:49:44,82] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-11 04:49:44,82] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-11 04:49:45,95] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,97] [info] JES batch polling interval is 33333 milliseconds
[2018-09-11 04:49:45,98] [info] PAPIQueryManager Running with 3 workers
[2018-09-11 04:49:45,98] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-11 04:49:45,98] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-11 04:49:46,04] [info] Unspecified type (Unspecified version) workflow 8823cc11-5e71-4004-b8d4-edf40eb38cd6 submitted
[2018-09-11 04:49:46,09] [info] SingleWorkflowRunnerActor: Workflow submitted 8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,13] [info] 1 new workflows fetched
[2018-09-11 04:49:46,13] [info] WorkflowManagerActor Starting workflow 8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,14] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-11 04:49:46,14] [info] WorkflowManagerActor Successfully started WorkflowActor-8823cc11-5e71-4004-b8d4-edf40eb38cd6
[2018-09-11 04:49:46,14] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-11 04:49:46,16] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-11 04:49:46,21] [info] MaterializeWorkflowDescriptorActor [8823cc11]: Parsing workflow as WDL draft-2
[2018-09-11 21:59:21,27] [info] MaterializeWorkflowDescriptorActor [2a0d0464]: Call-to-Backend assignments: atac.pool_ta_pr2 -> slurm, atac.filter -> slurm, atac.macs2_pr2 -> slurm, atac.overlap_pr -> slurm, atac.bowtie2 -> slurm, atac.$
ool_ta_pr1 -> slurm, atac.ataqc -> slurm, atac.reproducibility_overlap -> slurm, atac.overlap_ppr -> slurm, atac.spr -> slurm, atac.idr_ppr -> slurm, atac.xcor -> slurm, atac.qc_report -> slurm, atac.idr_pr -> slurm, atac.overlap -> slu$
m, atac.macs2_ppr2 -> slurm, atac.macs2_pr1 -> slurm, atac.macs2_ppr1 -> slurm, atac.read_genome_tsv -> slurm, atac.macs2_pooled -> slurm, atac.reproducibility_idr -> slurm, atac.pool_ta -> slurm, atac.macs2 -> slurm, atac.bam2ta -> slur
m, atac.idr -> slurm, atac.trim_adapter -> slurm
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,40] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:21,41] [warn] slurm [2a0d0464]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Starting atac.read_genome_tsv
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: 'enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!align_only && !true_rep_only && enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: 'enable_idr'. Running conditional section
[2018-09-11 21:59:23,66] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!disable_xcor'. Running conditional section
[2018-09-11 21:59:23,67] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!true_rep_only'. Running conditional section
[2018-09-11 21:59:23,67] [info] WorkflowExecutionActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 [2a0d0464]: Condition met: '!align_only && !true_rep_only'. Running conditional section
[2018-09-11 21:59:25,00] [warn] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks
[2018-09-11 21:59:25,43] [info] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: cat /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_geno
me_tsv/inputs/378634365/hg19.tsv
[2018-09-11 21:59:25,49] [info] DispatchedConfigAsyncJobExecutionActor [2a0d0464atac.read_genome_tsv:NA:1]: executing: sbatch \
--export=ALL \
-J cromwell_2a0d0464_read_genome_tsv \
-D /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv \
-o /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stdout \
-e /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr \
-t 60 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
--mem=4000 \
 \
--account mengguo \
--wrap "/bin/bash /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/script"
[2018-09-11 21:59:26,04] [error] WorkflowManagerActor Workflow 2a0d0464-f14a-4c4d-a994-cf084e712a66 failed (during ExecutingWorkflowState): java.lang.RuntimeException: Unable to start job. Check the stderr file for possible errors: /proj
ect2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr.submit
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.$anonfun$execute$2(SharedFileSystemAsyncJobExecutionActor.scala:131)
        at scala.util.Either.fold(Either.scala:188)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute(SharedFileSystemAsyncJobExecutionActor.scala:126)
        at cromwell.backend.sfs.SharedFileSystemAsyncJobExecutionActor.execute$(SharedFileSystemAsyncJobExecutionActor.scala:121)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.execute(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$executeAsync$1(StandardAsyncExecutionActor.scala:600)
        at scala.util.Try$.apply(Try.scala:209)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeAsync$(StandardAsyncExecutionActor.scala:600)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeAsync(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover(StandardAsyncExecutionActor.scala:915)
        at cromwell.backend.standard.StandardAsyncExecutionActor.executeOrRecover$(StandardAsyncExecutionActor.scala:907)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.executeOrRecover(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.$anonfun$robustExecuteOrRecover$1(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.core.retry.Retry$.withRetry(Retry.scala:37)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.withRetry(AsyncBackendJobExecutionActor.scala:61)
        at cromwell.backend.async.AsyncBackendJobExecutionActor.cromwell$backend$async$AsyncBackendJobExecutionActor$$robustExecuteOrRecover(AsyncBackendJobExecutionActor.scala:65)
        at cromwell.backend.async.AsyncBackendJobExecutionActor$$anonfun$receive$1.applyOrElse(AsyncBackendJobExecutionActor.scala:88)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.aroundReceive(ConfigAsyncJobExecutionActor.scala:208)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
        at akka.actor.ActorCell.invoke(ActorCell.scala:557)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[2018-09-11 21:59:26,04] [info] WorkflowManagerActor WorkflowActor-2a0d0464-f14a-4c4d-a994-cf084e712a66 is in a terminal state: WorkflowFailedState
[2018-09-11 21:59:35,67] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2018-09-11 21:59:36,55] [info] Workflow polling stopped
[2018-09-11 21:59:36,56] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2018-09-11 21:59:36,56] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2018-09-11 21:59:36,57] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2018-09-11 21:59:36,57] [info] JobExecutionTokenDispenser stopped
[2018-09-11 21:59:36,57] [info] Aborting all running workflows.
[2018-09-11 21:59:36,57] [info] WorkflowLogCopyRouter stopped
[2018-09-11 21:59:36,57] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2018-09-11 21:59:36,57] [info] WorkflowStoreActor stopped
[2018-09-11 21:59:36,57] [info] WorkflowManagerActor All workflows finished
[2018-09-11 21:59:36,57] [info] WorkflowManagerActor stopped
[2018-09-11 21:59:36,57] [info] Connection pools shut down
[2018-09-11 21:59:36,57] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] SubWorkflowStoreActor stopped
[2018-09-11 21:59:36,57] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,57] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2018-09-11 21:59:36,57] [info] JobStoreActor stopped
[2018-09-11 21:59:36,57] [info] CallCacheWriteActor stopped
[2018-09-11 21:59:36,57] [info] KvWriteActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,58] [info] DockerHashActor stopped
[2018-09-11 21:59:36,58] [info] IoProxy stopped
[2018-09-11 21:59:36,58] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2018-09-11 21:59:36,58] [info] ServiceRegistryActor stopped
[2018-09-11 21:59:36,60] [info] Database closed
[2018-09-11 21:59:36,60] [info] Stream materializer shut down
Workflow 2a0d0464-f14a-4c4d-a994-cf084e712a66 transitioned to state Failed
[2018-09-11 21:59:36,64] [info] Automatic shutdown of the async connection
[2018-09-11 21:59:36,64] [info] Gracefully shutdown sentry threads.
[2018-09-11 21:59:36,65] [info] Shutdown finished.

I never change -Dconfig.file=backends/backend.conf -Dbackend.default=slurm, I don't know it should be adjusted?

leepc12 commented 6 years ago

"local" here means running pipelines with downloaded (so locally existing) files.

This looks like a SLURM problem. Does your SLURM sbatch take --account or --partition?

Please post an example sbatch command or shell script template you use for submitting your own job to SLURM.

Also, can you run the following sbatch command and see what happens. Post any errors here. I will take a look.

sbatch \
--export=ALL \
-J cromwell_2a0d0464_read_genome_tsv \
-D /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv \
-o /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stdout \
-e /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/stderr \
-t 60 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
--mem=4000 \
 \
--account mengguo \
--wrap "/bin/bash /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/2a0d0464-f14a-4c4d-a994-cf084e712a66/call-read_genome_tsv/execution/script"

BTW, I got your email but I cannot personally skype with you.

gmgitx commented 6 years ago

I see, thanks!

I just put this command line in shell as doc step8, I run in/mypath/atac-seq-pipeline/ and source activate encode-atac-seq-pipeline: java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/mengguo/local/bin/cromwell-34.jar run atac.wdl -i /mypath1/input.json -o /mypath2/atac-seq-pipeline/workflow_opts/slurm.json

For "run the following sbatch command and see what happens", it happens

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
(encode-atac-seq-pipeline) Tue Sep 11 18:58:59 2018

slurm.json: { "default_runtime_attributes" : { "slurm_account": "mengguo" } }

leepc12 commented 6 years ago

@gmgitx: Your error says Invalid account or account/partition combination specified. Please post an example sbatch command or shell script template you use for submitting your own job to SLURM.

gmgitx commented 6 years ago

I got right account/partition from our IT after your mentioned, no error like that. But still can't work, for the document provide ENCSR356KRQ data. java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/mengguo/local/bin/cromwell-34.jar run atac.wdl -i /mypath1/ENCSR356KRQ_subsampled.json -o /mypath2/atac-seq-pipeline/workflow_opts/slurm.json this is the command I put in, I didn't add sbatch.

Part of warn and error:

[2018-09-14 17:11:22,17] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory fo$
 PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-14 17:11:22,17] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
...
[2018-09-14 17:11:23,73] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
...
[2018-09-14 17:13:09,52] [info] MaterializeWorkflowDescriptorActor [00851093]: Call-to-Backend assignments: atac.overlap_pr -> slurm, atac.spr -> sl[42/1857$
.qc_report -> slurm, atac.reproducibility_idr -> slurm, atac.reproducibility_overlap -> slurm, atac.pool_ta -> slurm, atac.macs2_pr2 -> slurm, atac.xcor -> s
lurm, atac.ataqc -> slurm, atac.overlap_ppr -> slurm, atac.filter -> slurm, atac.idr_ppr -> slurm, atac.idr_pr -> slurm, atac.bam2ta -> slurm, atac.overlap -
> slurm, atac.bowtie2 -> slurm, atac.macs2_ppr1 -> slurm, atac.pool_ta_pr2 -> slurm, atac.read_genome_tsv -> slurm, atac.trim_adapter -> slurm, atac.macs2_pp
r2 -> slurm, atac.macs2_pr1 -> slurm, atac.idr -> slurm, atac.pool_ta_pr1 -> slurm, atac.macs2 -> slurm, atac.macs2_pooled -> slurm
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,74] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,75] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [preemptible, disks] is/are not supported by backend. Unsupported attributes will not be part of job
executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
[2018-09-14 17:13:09,76] [warn] slurm [00851093]: Key/s [disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
...
[warn] DispatchedConfigAsyncJobExecutionActor [e0efd905atac.read_genome_tsv:NA:1]: Unrecognized runtime attribute keys: disks

...
[2018-09-14 20:35:09,32] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708413/ENCFF193RRC
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,32] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,32] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708414/ENCFF886FS$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708413/ENCFF366DF$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via hard link has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/cromwell-executions/atac/018d7f41-9161-4d4d-8a85-30b884e414c1/call-trim_adapter/shard-1/inputs/1398708414/ENCFF573UX$
.subsampled.400.fastq.gz -> /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz
[2018-09-14 20:35:09,33] [warn] Localization via copy has failed: /project2/yangili1/mengguo/ASP/atac-seq-pipeline/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz
[2018-09-14 20:35:09,34] [error] DispatchedConfigAsyncJobExecutionActor [018d7f41atac.trim_adapter:1:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation

...
[2018-09-14 17:00:23,88] [error] DispatchedConfigAsyncJobExecutionActor [2386aadbatac.trim_adapter:1:1]: Error attempting to Execute
java.lang.Exception: Failed command instantiation
        at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:537)
        at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand$(StandardAsyncExecutionActor.scala:472)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.instantiatedCommand$lzycompute(ConfigAsyncJobExecutionActor.scala:208)
        at cromwell.backend.impl.sfs.config.DispatchedConfigAsyncJobExecutionActor.instantiatedCommand(ConfigAsyncJobExecutionActor.scala:208)
leepc12 commented 6 years ago

I didn't mean a pipeline command line. I just wanted to see an example sbatch command line that you usually use. Is there an wiki page for your cluster?

What is your sbatch command line to submit the following HelloWorld shell script hello_world.sh?

#!/bin/bash
echo Hello world
echo Sleep 60
gmgitx commented 6 years ago

Sorry for my misunderstanding.

guide for our cluster https://github.com/jdblischak/giladlab-midway-guide

Here sbatch command: sbatch hello_world.sh then back me a file slurm-[number].out

leepc12 commented 6 years ago

Are you sure that sbatch hello_world.sh works without any extra parameters? If so, remove account settings ("slurm_account": "mengguo") from workflow_opts/slurm.json and try again.

gmgitx commented 6 years ago

Yes, in the slurm-[number].out,

Hello world
Sleep 60

I removed account, seems same warning and error report.

leepc12 commented 6 years ago

Please post a full log and also your workflow_opts/slurm.json.

gmgitx commented 6 years ago
command

sbatch --mem=8g --partition=broadwl run_atac.sh

atac.sh

#!/bin/bash
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/name2/local/bin/cromwell-34.jar run atac.wdl -i /project2/name1/name2/DLDS/ENCSR356KRQ_subsampled.json -o /project2/name1/name2/ASP/atac-seq-pipeline/workflow_opts/slurm.json
slurm.json
{
    "default_runtime_attributes" : {
        "slurm_partition": "broadwl"
    }
}
ENCSR356KRQ_subsampled.json

#########################result

slurm-49805362.out

leepc12 commented 6 years ago

I guess that you (or your partition) have a limited quota for resources on your cluster?

$ scontrol show partition broadwl

Do you have a privilege to use enough resources (memory>=16GB, cpu>=4, walltime>=48hr per task) on your partition?

Please run the following on the working directory where you ran the pipeline. This will make a tar ball of all log files and please upload it here. I need it for debugging:

$ find . -type f -name 'stdout' -or -name 'stderr' -or -name 'script' -or \
-name '*.qc' -or -name '*.txt' -or -name '*.log' -or -name '*.png' -or -name '*.pdf' \
| xargs tar -zcvf debug_issue_31.tar.gz
gmgitx commented 6 years ago

Thanks! scontrol show partition broadwl

PartitionName=broadwl
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=midway2-[0002-0089,0103-0124,0137-0182,0221-0230,0258-0280,0282-0301,0312-0398,0400]
   PriorityJobFactor=20 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=8316 TotalNodes=297 SelectTypeParameters=NONE
   DefMemPerCPU=2048 MaxMemPerCPU=9223372036854775807

tar zxvf debug_issue_31.tar.gz

leepc12 commented 6 years ago

@gmgitx Your tarball does not have any file in it.

gmgitx commented 6 years ago

Thanks! After my executed that command on the working directory where I ran the pipeline, one debug_issue_31.tar.gz left. So it means what if no file in it or it should have?

leepc12 commented 6 years ago

Please send that file debug_issue_31.tar.gz to my email.

gmgitx commented 6 years ago

I sent

leepc12 commented 6 years ago

I got your log but it includes outputs from too many pipeline runs. But for the latest pipeline run, I found that the first task of the pipeline worked fine so you can keep using your partition broadwl. But the next step failed and I need to figure it out. I guess that it's rejected by a cluster due to resource quota.

What is resource quota on your cluster? How much resource your partition can use on your cluster? For example, maximum number of concurrent jobs, max cpu per job, max memory per job, max walltime per job. This information will be helpful for debugging it.

Can you clean up (rm -rf cromwell-execution*) your output directories and run a pipeline again? Or if that rm -rf does not work then make a new directory and follow steps on the documentation again. And then post both your screen log and a new tar ball (please make a new one using the same command).

gmgitx commented 6 years ago

Many thanks! I sent debug_issue_31.tar.gz to your email, and slurm-49925402.out

According what i know from IT, memory>=16GB, cpu>=4 are allowed but walltime must be under 36 hours total

my partition: MaxCPUsPerUser 2800 MaxNodesPerUser 100 MaxJobsPerUser 100 MaxSubmitJobs 500 MaxWall 1-12:00:00

leepc12 commented 6 years ago

Default walltime for bowtie2 is 48 hours. I think this caused the problem. Please add the following to your input JSON and try again.

    "atac.bowtie2.mem_mb" : 10000,
    "atac.bowtie2.cpu" : 1,
    "atac.bowtie2.time_hr" : 12,

Also, reduce the number of concurrent jobs to 1 or 2 in backends/backend.conf. https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/backends/backend.conf#L164

gmgitx commented 6 years ago

Thanks your kindly help. This fold "cromwell-executions" not be created after ran, I modified as your advice this time and by,
### sbatch ./example.sbatch #example.sbatch

#!/bin/bash
#SBATCH --job-name=example_sbatch
#SBATCH --output=example_sbatch.out
#SBATCH --error=example_sbatch.err
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --mem-per-cpu=20
source activate encode-atac-seq-pipeline
bash run_atac1.sh
source deactivate

###run_atac1.sh

#!/bin/bash
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm /home/name2/local/bin/cromwell-34.jar run atac.wdl -i /project2/name1/name2/DLDS/ENCSR356KRQ_subsampled.json -o /project2/name1/name2/ASP/atac-seq-pipeline/workflow_opts/slurm.json

sbatch_report

Sorry for its error. Also, I sent debug_issue_31.tar.gz to your email

However, thank you again.

leepc12 commented 6 years ago

Your sbatch_report is trimmed?

gmgitx commented 6 years ago

I combined two files (example_sbatch.err and example_sbatch.out) together, no other process

leepc12 commented 6 years ago

Please take a look at the ###example_sbatch.out part.

A log file in your tar ball says that some of the sub-tasks (read_genome_tsv, trim_adapter) were done successfully but they are not shown on your sbatch_report. I think it's trimmed indeed. It only shows some initialization stages of the pipeline.

gmgitx commented 6 years ago

yes, you are right, here is a example_sbatch.out. But I think if sub-tasks were done successfully, fold "cromwell-executions" should be created and here not

###example_sbatch.out

[2018-09-25 18:19:47,44] [info] Running with database db.url = jdbc:hsqldb:mem:47a741e8-0324-4c29-a170-f4dd54d61b24;shutdown=false;hsqldb.tx=mvcc
[2018-09-25 18:19:55,46] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2018-09-25 18:19:55,47] [info] [RenameWorkflowOptionsInMetadata] 100%
[2018-09-25 18:19:55,56] [info] Running with database db.url = jdbc:hsqldb:mem:124b08c3-9730-416e-a798-3cf91acbf493;shutdown=false;hsqldb.tx=mvcc
[2018-09-25 18:19:55,88] [warn] This actor factory is deprecated. Please use cromwell.backend.google.pipelines.v1alpha2.PipelinesApiLifecycleActorFactory for PAPI v1 or cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory for PAPI v2
[2018-09-25 18:19:55,92] [warn] Couldn't find a suitable DSN, defaulting to a Noop one.
[2018-09-25 18:19:55,92] [info] Using noop to send events.
[2018-09-25 18:19:56,19] [info] Slf4jLogger started
[2018-09-25 18:19:56,37] [info] Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-a9ac2b1",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
[2018-09-25 18:19:56,40] [info] Metadata summary refreshing every 2 seconds.
[2018-09-25 18:19:56,43] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2018-09-25 18:19:56,43] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-25 18:19:56,43] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2018-09-25 18:19:57,18] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2018-09-25 18:19:57,20] [info] SingleWorkflowRunnerActor: Version 34
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] JES batch polling interval is 33333 milliseconds
[2018-09-25 18:19:57,20] [info] PAPIQueryManager Running with 3 workers
[2018-09-25 18:19:57,21] [info] SingleWorkflowRunnerActor: Submitting workflow
[2018-09-25 18:19:57,25] [info] Unspecified type (Unspecified version) workflow 40567000-f7d2-491b-b255-44cdcec9a54b submitted
[2018-09-25 18:19:57,30] [info] SingleWorkflowRunnerActor: Workflow submitted 40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,31] [info] 1 new workflows fetched
[2018-09-25 18:19:57,31] [info] WorkflowManagerActor Starting workflow 40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,31] [warn] SingleWorkflowRunnerActor: received unexpected message: Done in state RunningSwraData
[2018-09-25 18:19:57,31] [info] WorkflowManagerActor Successfully started WorkflowActor-40567000-f7d2-491b-b255-44cdcec9a54b
[2018-09-25 18:19:57,32] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2018-09-25 18:19:57,32] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2018-09-25 18:19:57,37] [info] MaterializeWorkflowDescriptorActor [40567000]: Parsing workflow as WDL draft-2
leepc12 commented 6 years ago

Can you upload your modified input JSON here?

gmgitx commented 6 years ago

Sure, thanks ####.../ENCSR356KRQ_subsampled.json

{
    "atac.pipeline_type" : "atac",
    "atac.genome_tsv" : "/project2/name1/name2/DLDS/process_data/hg19db/hg19.tsv",
    "atac.fastqs" : [
        [
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair1/ENCFF341MYG.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair2/ENCFF248EJF.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair1/ENCFF106QGY.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep1/pair2/ENCFF368TYI.subsampled.400.fastq.gz"]
        ],
        [
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF641SFZ.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF031ARQ.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF751XTV.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF590SYZ.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF927LSG.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF734PEQ.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF859BDM.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF007USV.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF193RRC.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF886FSC.subsampled.400.fastq.gz"],
            ["/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair1/ENCFF366DFI.subsampled.400.fastq.gz",
             "/project2/name1/name2/DLDS/test_sample/ENCSR356KRQ/fastq_subsampled/rep2/pair2/ENCFF573UXK.subsampled.400.fastq.gz"]
        ]
    ],

    "atac.paired_end" : true,
    "atac.multimapping" : 4,

    "atac.trim_adapter.auto_detect_adapter" : true,
    "atac.trim_adapter.cpu" : 1,

    "atac.bowtie2.mem_mb" : 10000,
    "atac.bowtie2.cpu" : 1,
    "atac.bowtie2.mem_hr" : 12,

    "atac.filter.cpu" : 1,
    "atac.filter.mem_mb" : 12000,

    "atac.macs2_mem_mb" : 16000,

    "atac.smooth_win" : 73,
    "atac.enable_idr" : true,
    "atac.idr_thresh" : 0.05,

    "atac.qc_report.name" : "ENCSR356KRQ (subsampled 1/400 reads)",
    "atac.qc_report.desc" : "ATAC-seq on primary keratinocytes in day 0.0 of differentiation"
}

####...atac-seq-pipeline/workflow_opts/slurm.json

{
    "default_runtime_attributes" : {
        "slurm_partition": "broadwl"
    }
}
leepc12 commented 6 years ago

I think this is a resource quota/limit problem on your cluster. Please play with some resource settings in your input JSON. You may need to revert back to the last partially successful configuration (for some tasks) somehow and change settings for resources.

https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/docs/input.md#resource

Resource settings for one of your successful task (trim_adapter) was 2 cpu, 12000 mem_mb, 24 time_hr.

gmgitx commented 6 years ago

Thanks! Although just have trim result, I'll continue to adjust resource settings. But for run out trim_adapter

.../atac-seq-pipeline/cromwell-executions/atac/06bf6b3b-164f-4917-9507-d90a58a428e4/call-trim_adapter/shard-0/execution
merge_fastqs_R1_ENCFF341MYG.subsampled.400.trim.merged.fastq.gz
merge_fastqs_R2_ENCFF248EJF.subsampled.400.trim.merged.fastq.gz
.../atac-seq-pipeline/cromwell-executions/atac/06bf6b3b-164f-4917-9507-d90a58a428e4/call-trim_adapter/shard-1/execution
merge_fastqs_R1_ENCFF641SFZ.subsampled.400.trim.merged.fastq.gz
merge_fastqs_R2_ENCFF031ARQ.subsampled.400.trim.merged.fastq.gz

Could I make sure it's right or something wrong for just get first two files's trim result for each rep of ENCSR356KRQ?

leepc12 commented 6 years ago

Yes, these fastqs (two for each replicate) look fine.

leepc12 commented 5 years ago

Closing this issue due to long inactivity.