chanzuckerberg / miniwdl

Workflow Description Language developer tools & local runner
MIT License
175 stars 54 forks source link

Generate WDL from a parsed tree #216

Open dinvlad opened 5 years ago

dinvlad commented 5 years ago

Hi Mike,

One external feature we're looking to implement in the upcoming DSP hackathon is collection of runtime monitoring data from individual tasks in WDL. The metrics would be reported from each individual task call on PAPIv2, using a custom monitoring_image that records time-series of cpu/memory/disk utilization to BigQuery. There's a (somewhat outdated) proposal for this in https://github.com/broadinstitute/cromwell-task-monitor-bbq (I believe in our current thinking, it could be implemented w/o modifying Cromwell).

This data will then be used to build regression models that predict the actual amount of resources required for a given task, given its inputs. To collect training data however, we may need to do a "sweep"/"ladder" on the runtime parameters. If we could modify the runtime parameters inside the WDL without having to supply them as task inputs (which would require involvement of the workflow developer), this would provide a fully automated way to optimize a given workflow, without having to supply anything other than a collection of actual workflow inputs.

So in summary, it would be helpful to replace runtime attributes for a given task, and render the workflow tree back into a .wdl file, without any other modifications to the workflow itself. Do you think this feature could be accomplished by MiniWDL? Or do you have any alternatives we could use to achieve that?

EDIT: And as a bonus, we could also make this part of LSP feedback to the user. So if their workflow ran on PAPIv2, we could provide "auto-suggestions" on the calculated amount of resources for a given task (like suggest replacing memory: '2G' with memory: ceil(2.34 * size(in_bam, 'G')).

Thanks a lot!

CC @rexwangcc @mohawkTrail

mlin commented 5 years ago

@dinvlad we'd definitely like miniwdl to include AST serialization & pretty-printing, for use cases like this and also for style enforcement (like gofmt or python black). I don't have a specific timeline in mind right now, but it clearly would add a lot of utility. Thank you for describing the specific use case which is always really helpful to have in mind.

cc @DavyCats @ksebby @jdidion who've all discussed this at various points

ksebby commented 5 years ago

Thanks for tagging me @mlin. Yes we are thinking about working on adding serialization functionality for WDL to miniwdl and are actually including that in a proposal. Would be happy to discuss and collaborate with anyone that is interested.

illusional commented 4 years ago

Throwing a +1 out there for serialisation from an AST tree. Doing some WDL generation, and currently using string interpolation to do it, but would love to build an AST tree and be able to serialize it.

illusional commented 4 years ago

Hey @mlin, do you have anything in mind on pretty printing the WDL AST? I'm a noob at this sort of stuff, but my basic understanding is you'd need to walk the AST, and essentially build the elements again (potentially with textboxes).

I'd be happy to take a crack at this (with some guidance), but I'm not really sure where to start.

Sources:

mlin commented 4 years ago

@illusional still no timeframe for this on my end, which is regrettable (but also means the field remains wide open for community contributions). @DavyCats added some great stringification functions on the WDL.Expr module to pretty-print individual expressions, which has already been useful in spots like the wdlviz example, but full WDL source production is a larger undertaking of course.

It may be helpful to define the desired scope and functionality a bit more. Clicking through to your Janis link, I infer you might be interested in generating an AST "from scratch" -- I think this might need significant new API design, as miniwdl's AST is currently lacking all the constructors and mutator methods one would want in order to do that cleanly (in contrast to the somewhat-messy construction process in the syntax parser not meant for public consumption).

(A more limited version of this feature might not require a fully mutable AST, but rather focus on reformatting & pretty-printing existing WDL source code, like a WDL equivalent of gofmt or Python black.)

illusional commented 4 years ago

Thanks for taking the time to reply. Your inference was absolutely correct, I was hoping to build an AST from scratch (other code to build it) and then pretty print it. I think there's definitely still a place for a WDL formatter (like black), especially as the field for writing WDL matures.

multimeric commented 4 years ago

I could be wrong, but I think part of the limiting factor here is that Lark (the parser library) doesn't actually support code generation. So, even if we could map from miniwdl types into lark's AST format, we couldn't use Lark to then generate WDL. So it's blocked on multiple fronts.

jdidion commented 4 years ago

@mlin @illusional I have already starting working on a simple implementation of this using Jinja2 templates and very opinionated formatting (like black). The only problem I've encountered is that the AST does not retain comments, so the reformatted code will have comments stripped. I should have something for you to try out later this week.

mlin commented 4 years ago

:+1: comment retention is also needed to enable suppression of miniwdl check warnings on individual lines

jdidion commented 4 years ago

I have a prototype here: https://github.com/jdidion/wdlkit

After I finished this I found out DNAnexus already has a toolkit that's being developed in Scala (https://github.com/dnanexus-rnd/wdlTools). Therefore, I plan to port what I have to that project and discontinue the wdlkit project. @mlin if you're happy with the syntax formatter in wdlkit I'm happy to donate it to miniwdl. Alternatively, if anyone wants to take ownership of wdlkit and keep developing it as a separate package, I'm happy to hand over the keys.

mlin commented 4 years ago

Just-released v0.7.3 finally includes source code comments in the miniwdl AST, so we're one step closer here although wdlTools also looks to be coming along nicely.

jdidion commented 4 years ago

Nice! Yes I have a full-fledged formatter working in wdlTools (rather than just being template-based), although it still needs a lot of testing.

multimeric commented 4 years ago

I could be wrong, but I think part of the limiting factor here is that Lark (the parser library) doesn't actually support code generation.

I'm happy to find out that I'm wrong about this. Turns out Lark has a Reconstructor class that takes a parser and an AST and returns the original code. I found this to be a good example: https://github.com/lark-parser/lark/blob/master/examples/reconstruct_json.py.

So in theory this could be used not only to generate code, but also to implement formatting without needing something like Jinja to format the WDL (as wdlkit is doing). I'm not sure how it handles comments and whitespace, though. I'll look into this when I get a chance.