broadinstitute / cromwell-tools

A collection of Python clients and accessory scripts for interacting with the Cromwell
https://cromwell-tools.readthedocs.io
BSD 3-Clause "New" or "Revised" License
22 stars 9 forks source link

ENH: add validate capability to cromwell-tools #19

Closed ambrosejcarr closed 6 years ago

ambrosejcarr commented 6 years ago

PR Summary

This is a work-in-progress PR that enables cromwell-tools to trigger womtool validate by localizing the wdl and all listed tasks in dependencies.json to a temporary directory and calling womtool validate from that directory.

I also optimized the imports (stdlib > external libs > cromwell_tools)

Major questions for this work-in-progress:

  1. Did I incorporate existing download commands adequately? I the case of womtool we need to dump the files to a directory, so it didn't make sense for me to slurp local files into memory. As a result, I wrote a new localize method that uses the download_http method for http files.
  2. How to deal with womtool dependency for test purposes? I hard coded a location which is obviously not good, and is why the tests fail currently.

PR Checklist:

rexwangcc commented 6 years ago

@ambrosejcarr This looks good to me! For the major questions:

  1. I think _localize_file is necessary for this validation use case.
  2. When it comes to "How to deal with womtool dependency for test purposes", since it's around 70MB, it may not be a good choice to host it on Github either.

Some possible choices for 2 would be:

ambrosejcarr commented 6 years ago

Of the options, I like (2, docker) the best. On one hand I don't like adding docker dependencies, but if people are using cromwell, I think it's safe to assume they're at least comfortable learning about containers.

(3) will make the tests take much longer, while (1) could potentially cause us a lot of egress charges if people start using this repo or we run a lot of tests.

What do you think?

rexwangcc commented 6 years ago

Yeah, I think (2) docker would be the best among all of the 3 options, whereas requires a little bit more refactoring to your current code. Besides, I'm open to any other alternatives.

lbergelson commented 6 years ago

We use git-lfs in gatk to store large binaries in git. It works well but adds some hassle.

ambrosejcarr commented 6 years ago

Comments addressed, but depends on #28 to get the tests to pass on travis.

ambrosejcarr commented 6 years ago

Ok, this one's ready to be re-reviewed. I kinda mangled the docker a bit. Would love it if this generated some conversation about #26