broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
993 stars 360 forks source link

Defined() does not work on undefined keys in object. #3093

Closed rhpvorderman closed 6 years ago

rhpvorderman commented 6 years ago

In the following code snippet:

    scatter(centrifuge in centrifugeList){
        call centrifugeDownload {
            input:
              centrifugeOutput= centrifuge.outputDir,
              domain=centrifuge.domain,
              database=if defined(centrifuge.database) then centrifuge.database else "refseq"
              }
        }
    }

The centrifugeList is a list of dictionaries. The resulting centrifuge object may or may not have a key database.

Expected behaviour:

database defaults to "refseq" if no database key is present in the dictionary. It will use the database key if it exists.

Observed behaviour:

java.lang.RuntimeException: Evaluating if defined(centrifuge.database) then centrifuge.database else "refseq" failed: Could not find key database in WdlObject
        at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.$anonfun$processRunnable$2(ExpressionKey.scala:36)
        at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.$anonfun$processRunnable$2$adapted(ExpressionKey.scala:31)
        at scala.Function1.$anonfun$andThen$1(Function1.scala:52)
        at cats.data.Validated.fold(Validated.scala:14)
        at cats.data.Validated.bimap(Validated.scala:109)
        at cats.data.Validated.map(Validated.scala:152)
        at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:31)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$processRunnableTaskCallInputExpression$4(WorkflowExecutionActor.scala:452)
        at scala.util.Either.flatMap(Either.scala:338)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$processRunnableTaskCallInputExpression$2(WorkflowExecutionActor.scala:449)
        at scala.util.Either.flatMap(Either.scala:338)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$processRunnableTaskCallInputExpression$1(WorkflowExecutionActor.scala:448)
        at scala.util.Either.flatMap(Either.scala:338)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processRunnableTaskCallInputExpression(WorkflowExecutionActor.scala:447)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$4(WorkflowExecutionActor.scala:421)
        at cats.instances.ListInstances$$anon$1.$anonfun$traverse$2(list.scala:65)
        at cats.instances.ListInstances$$anon$1.loop$2(list.scala:58)
        at cats.instances.ListInstances$$anon$1.$anonfun$foldRight$2(list.scala:60)
        at cats.Eval$Call$.cats$Eval$Call$$loop(Eval.scala:267)
        at cats.Eval$Call.value(Eval.scala:257)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:64)
        at cats.instances.ListInstances$$anon$1.traverse(list.scala:12)
        at cats.Traverse$Ops.traverse(Traverse.scala:19)
        at cats.Traverse$Ops.traverse$(Traverse.scala:19)
        at cats.Traverse$ToTraverseOps$$anon$3.traverse(Traverse.scala:19)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:416)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:148)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:146)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
        at akka.actor.FSM.processEvent(FSM.scala:668)
        at akka.actor.FSM.processEvent$(FSM.scala:662)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:41)
        at akka.actor.LoggingFSM.processEvent(FSM.scala:801)
        at akka.actor.LoggingFSM.processEvent$(FSM.scala:783)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:41)
        at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:659)
        at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:653)
        at akka.actor.Actor.aroundReceive(Actor.scala:514)
        at akka.actor.Actor.aroundReceive$(Actor.scala:512)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:41)
        at akka.actor.Timers.aroundReceive(Timers.scala:40)
        at akka.actor.Timers.aroundReceive$(Timers.scala:36)
        at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:41)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
        at akka.actor.ActorCell.invoke(ActorCell.scala:496)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
cjllanwarne commented 6 years ago

@rhpvorderman great comment, I don't think there's an especially nice workaround to this right now.

This is caused because defined operates on optional values (eg File?) whereas object member accesses either returns a value, or fail the workflow. We could fix this in a couple of ways:

And with my apologies, I'm now going to immediately redirect you elsewhere...

Since any of the above suggestions would require a change to the WDL spec, I suggest you open this as a new issue in that repo. Once the change is accepted there I'll be able to make a new issue here to implement the change in Cromwell.

cjllanwarne commented 6 years ago

One other comment, I'd suggest you check out (and maybe vote on) the proposed addition of structs in this PR in openWDL: https://github.com/openwdl/wdl/pull/160

If the PR passes, it's possible your problem might be fixed without needing further spec changes. You'll be able to define a Centrifuge struct like this:

struct Centrifuge {
  String? database
  # ... other fields
}

If you had this then the defined method would be able to key off the database field (which would always have a value, either set to a string, or set to a null).

This isn't part of the WDL spec yet, but it's very close, so watch out for it!

rhpvorderman commented 6 years ago

Thank you @cjllanwarne . A Struct would be indeed a nicer way to fix this as you can define all the items. An object can then be used for simpler things. Extra methods just add clutter. I will check it out.

EDIT: on second thought, best to leave this issue open until structs are actually there.

rhpvorderman commented 6 years ago

Structs are there.