broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
988 stars 357 forks source link

Allow String? for 'docker' RuntimeAttribute #1832

Open cjllanwarne opened 7 years ago

cjllanwarne commented 7 years ago

In other words, we should validate docker as an optional string so that we can do things like this:

task foo {
  String? dockerName
  command { ... }
  runtime {
    docker: dockerName
  }
  output { ... }
}

The expected behaviour here is:

kcibul commented 7 years ago

Each backend could decide how to interpret the optional nature. For example the JES backend could decide that it's required and fail. The SFS backend could decide it's optional and just run without docker.

Specifically, the SFS backend (SGE) should implement this optional docker behavior

geoffjentry commented 7 years ago

While useful this is IMO more of the complecting of RAs that make me think they need a redo.

kcibul commented 7 years ago

you said that in the original thread... that we said "let's do something tactical to solve the problem now" while the complecting continues ;)

LeeTL1220 commented 7 years ago

@kcibul regarding issue #1804 .... Would my wdl and json look as follows?


workflow yo {
 String msg
 String? docker_image

 call task1 {
  input:
    msg=msg,
    docker_image=docker_image
 }
}

# Run a message in an arbitrary docker container (e.g. "broadinstitute/eval-gatk-protected:crsp_validation_latest")
task task1 {
 String msg
 String ? docker_image

 command {
  echo ${msg}
 } 

 runtime {
  docker: "${docker_image}"
  memory: "1GB"
 }
}

When I want a docker image:

{
 "yo.msg": "foo"
 "yo.docker_image": "broadinstitute/eval-gatk-protected:crsp_validation_latest"
}

No docker image:

{
 "yo.msg": "foo"
}
LeeTL1220 commented 7 years ago

@kcibul why not support empty string as null, for docker, in the backend code? Are you worried about discerning an error where the user wants docker, but accidentally specifies empty string?

If my example wdl/json above is correct, I'd rather not have to delete json entries, since this is a bit more difficult to script around, but I can be convinced.

geoffjentry commented 7 years ago

@leetl1220 let's not :)

geoffjentry commented 7 years ago

@kcibul something something squeaky wheel

LeeTL1220 commented 7 years ago

@geoffjentry I'm confused... Regardless, if it is a disproportionate amount of work, then I am convinced.

Can someone confirm that my wdl/json is correct?

geoffjentry commented 7 years ago

@leetl1220 I'd rather not introduce magic values and billion dollar mistakes into wdl

LeeTL1220 commented 7 years ago

@geoffjentry Geez... such drama... fine then...

kcibul commented 7 years ago

why do our tickets turn in to long discussions ;) @leetl1220 and @geoffjentry let's talk after strat board, decide something and move on

LeeTL1220 commented 7 years ago

@kcibul No need. I'll +1 this if someone can confirm that the above wdl/json (that I posted) is more-or-less correct.

cjllanwarne commented 7 years ago

@LeeTL1220 Yes, that's the options file you would use. Perhaps alternatively using a javascript null to signify an active choice of no docker:

{
 "yo.msg": "foo"
 "yo.docker_image": null
}
LeeTL1220 commented 7 years ago

Looks good to me.

On Tue, Jan 10, 2017 at 12:26 PM, Chris Llanwarne notifications@github.com wrote:

@LeeTL1220 https://github.com/LeeTL1220 Yes, that's the options file you would use. Perhaps alternatively using a javascript null to signify an active choice of no docker:

{ "yo.msg": "foo" "yo.docker_image": null }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/cromwell/issues/1832#issuecomment-271639848, or mute the thread https://github.com/notifications/unsubscribe-auth/ACDXk-uTQW6vH1UcC6V5NfvYNm3aPoeyks5rQ79BgaJpZM4LfjE0 .

-- Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632

LeeTL1220 commented 7 years ago

(I prefer null, but can do whichever)

On Tue, Jan 10, 2017 at 12:27 PM, Lee Lichtenstein < lichtens@broadinstitute.org> wrote:

Looks good to me.

On Tue, Jan 10, 2017 at 12:26 PM, Chris Llanwarne < notifications@github.com> wrote:

@LeeTL1220 https://github.com/LeeTL1220 Yes, that's the options file you would use. Perhaps alternatively using a javascript null to signify an active choice of no docker:

{ "yo.msg": "foo" "yo.docker_image": null }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/cromwell/issues/1832#issuecomment-271639848, or mute the thread https://github.com/notifications/unsubscribe-auth/ACDXk-uTQW6vH1UcC6V5NfvYNm3aPoeyks5rQ79BgaJpZM4LfjE0 .

-- Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632 <(617)%20714-8632>

-- Lee Lichtenstein Broad Institute 75 Ames Street, Room 7003EB Cambridge, MA 02142 617 714 8632

katevoss commented 7 years ago

After discussing with @LeeTL1220 and redteam this is lower priority, as when using SGE the docker stanza is ignored and this is an acceptable workaround.

cjllanwarne commented 7 years ago

Pinging this ticket with another requester from the forums: http://gatkforums.broadinstitute.org/wdl/discussion/9936/is-there-a-way-to-tell-cromwell-to-ignore-the-runtime-section-of-the-tasks-in-a-wdl-script#latest

katevoss commented 7 years ago

Adding this to the Runtime Attributes improvement spec

As a workflow runner on SGE, I want be able to make the Docker attribute to be optional, so that I can avoid rewriting my WDL when I don't use Docker.

geoffjentry commented 6 years ago

@cjllanwarne If this is a path you want to pursue you should put on your commodore cap and do it on the other side of the fence :)

mcovarr commented 6 years ago

Commodore WDL

LeeTL1220 commented 6 years ago

@ruchim We are getting a lot of questions about being able to run local/SGE without docker at the Spain workshop.... Not just Spanish people, but from several countries.

LeeTL1220 commented 6 years ago

@ruchim I should clarify. There are a lot of people at this workshop (if not all of them) that cannot use the cloud, due to regulations, clinical requirements, etc. Or they need to be able to run WDL on their local on-prem compute cluster for testing on small cohorts, etc. This is a common configuration that prohibits docker. We really do not want these users to be forced to change the WDL that we (DSP methods) write and test. In order to stay backend-agnostic, can we implement a null option for docker as described in this issue (and #1804 )?

geoffjentry commented 6 years ago

Or they could use something like nodocker or singularity

LeeTL1220 commented 6 years ago

Singularity would require changes to the cromwell configuration file, correct? Since the docker command would change. Probably not hard, but would need documentation.

kshakir commented 6 years ago

This is a common configuration that prohibits docker.

"Local" and SGE/HPC are two separate issues.

SGE (and all HPC) backends can already run without docker. When setting up the backend, just don't add a submit-docker config variable nor a docker runtime attribute to the backend's configuration. Docs for new local/HPC backends are documented under the title "SGE".

Separately, there is the issue that cromwell is pre-loaded with a default "Local" backend. This "Local" backend is docker enabled. The simplest workaround today is to create another backend "Local-NoDocker" or similar. Or if one wants to just change the existing "Local" backend they can use a config like:

include required(classpath("application"))
backend.providers.Local.config.runtime-attributes=""
backend.providers.Local.config.submit-docker=null
eweitz commented 5 years ago

I encountered a need for this improvement while developing WDLs intended for FireCloud. Cromwell support for optional Docker runtimes would enable me to write FireCloud WDLs with quicker development iterations. It would also enable faster and more resilient automated testing (i.e., unit testing) of such WDLs.

My current approach to developing WDLs intended for FireCloud is to add Cromwell and my WDL (foo.wdl) into the Docker image that contains my workflow dependencies (e.g. Python and R code). Then, from my local machine, I execute in my Docker container an equivalent version of my FireCloud WDL with a command like docker exec $containerId java -jar cromwell-36.1.jar run foo_test.wdl --inputs test_inputs.json --options options.json.

Without a way to override the runtime attribute (or ignore its docker key) in foo.wdl, I resort to commenting out the attribute and copying the content to foo_test.wdl. This enables fast development and unit testing, but requires manually syncing foo.wdl and foo_test.wdl. That, of course, has poor maintainability -- my approach is a kludge.

Adam (@aednichols) and I investigated better ways to do this, but found none. See Slack for more details about my issue. In summary, as an engineer using Cromwell to develop and test FireCloud WDLs, support for optional Docker runtimes as proposed here strikes me as the best option for my use case.

aednichols commented 5 years ago

Another point that Eric and I came across is that a "Docker in Docker" solution - i.e. installing Docker inside the Docker container where he's running Cromwell - is not good either because it necessitates pushing and re-pulling the Docker image he's iterating on, which makes for annoyingly long cycle times and can't work with bad or no Internet.