common-workflow-lab / wdl-cwl-translator

A translator from WDL to CWL v1.2
Apache License 2.0
26 stars 8 forks source link

Some WDL Maps are not handled #77

Open dinithiW opened 3 years ago

dinithiW commented 3 years ago

Currently, the WDL input compound type Map[] is not handled by the translator. (https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#mapp-y)

A workflow using Maps is given in this workflow (https://github.com/biowdl/tasks/blob/bc1bacf11498d2d30b85591cfccdcf71ef0966a5/star.wdl#L144)

edit by @mr-c , we now correctly translate non-input variables of type Map as demonstrated by #189 but I can't find any examples of real-world WDL workflows with inputs of type Map ; currently https://github.com/broadinstitute/warp/blob/b09880a71e3d3e42fa4b544d03aea23c0a246efc/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl#L46 does not translate, but that is probably more to due with the use of the Set type

RaOneG commented 3 years ago

I am looking at this, and I am not exactly sure what I should do. Is it just to create a translation from WDL to CWL like here or am I missing something obvious?

RaOneG commented 3 years ago

Oh I should create a class for map, is that it?

mr-c commented 3 years ago

@RaOneG Pardon my delay

In this instance,

     Map[String, String] samOutputNames = {"BAM SortedByCoordinate": "sortedByCoord.out.bam"}

is not in the WDLinputs section, it is a private declaration.

So to convert this example we would need to figure out how to represent the only other place samOutputNames is used. Here it is in the outputs section, to calculate the name of the output file:

   File bamFile = outFileNamePrefix + "Aligned." +  samOutputNames[outSAMtype]

So the strategy for Map used outside the inputs section (and not used as the type of one of the outputs) is to

  1. Convert that private declaration to Javascript. In this example that would be

    var samOutputNames = Map([ ["BAM SortedByCoordinate", "sortedByCoord.out.bam"] ];
  2. Then continue as normal. For our example, that would be producing

    ${
    var samOutputNames = new Map([ ["BAM SortedByCoordinate", "sortedByCoord.out.bam"] ]);
    return inputs.outFileNamePrefix + "Aligned." + samOutputNames.get(inputs.outSAMtype);
    }

    for the output_glob at https://github.com/common-workflow-lab/wdl-cwl-translator/blob/3e287fbb1d17e0d6166bfe5ed7e82ffac0c007f6/wdl2cwl/main.py#L744

However, the Javascript Map type is not part of ECMAScript 5.1, which is the version of Javascript that CWL v1.x Expressions use

While in this instance we can rely on the Map like qualities of object in Javascript because all the keys in samOutputNames are strings, there may be regular uses of WDL's Map type that use other types of keys, like numbers.

So for now we can use a Javascript Object instead:

${
var samOutputNames = {"BAM SortedByCoordinate": "sortedByCoord.out.bam"};`
return inputs.outFileNamePrefix + "Aligned." + samOutputNames[inputs.outSAMtype];
}

But eventually we will have to find another solution like adapting this polyfill that brings the ECMAScript 6 Map to ECMAscript 5: https://github.com/zloirock/core-js/blob/v2.6.12/modules/es6.map.js

mr-c commented 3 years ago

A simpler polyfill for ES 6's Map is https://github.com/anonyco/Javascript-Fast-Light-Map-WeakMap-Set-And-WeakSet-JS-Polyfill/blob/07595a2acb8df921c6481024218200c441f87621/mapPolyfill.src.js (adapted for our use at https://github.com/common-workflow-lab/wdl-cwl-translator/blob/32e9c5dd463c113e63f056f933201a1a7a44567b/map.js)

RaOneG commented 3 years ago

Hi @mr-c, I am sorry but this is a little difficult for me to understand in addition to I have no prior knowledge of Javascript. Is there something I can learn/read to get some more in-depth knowledge about what you mentioned above apart from the Javascript part?

mr-c commented 3 years ago

Hello @RaOneG A very basic level of JavaScript understanding will be necessary. I personally know very little Javascript, instead I constantly do web searches to remind myself of the correct syntax. I literally Google searched for "if then else javascript" the other week!

As for my comments on this issue, you can ignore everything I wrote about "polyfill"s and ECMAScript Map.

mr-c commented 2 years ago

I asked in the OpenWDL slack for a real-world example of a Map input https://openwdl.slack.com/archives/CT0QEK1V0/p1646937620784419 ; pending a real-world usage or user request, I think this is now a low-priority issue

mr-c commented 2 years ago

@dpark01 writes

so from a high level, the reason I find myself using read_json or Map structures in general in WDL is when I'm

  • scattering on X
  • need to track/coordinate metadata on a per-X basis, either post-gather or even inside the scatter block

https://github.com/broadinstitute/viral-pipelines/blob/b4010419ac2710b73e6c70be491e43778b1625f8/pipes/WDL/tasks/tasks_ncbi_tools.wdl#L433-L519

this particular example task is an interesting one where I take a bunch of same-length Array inputs and basically zip them together and then deduplicate them based on one of the strings: that is, it's assumed that the biosamples input array may contain non-unique values, so the output of this task produces an output Array of unique biosamples and then a bunch of Maps that let the workflow code query metadata inside a scatter block for each biosample.

https://github.com/broadinstitute/viral-pipelines/blob/bd0d777617229e61dd9d659054f99e568820295b/pipes/WDL/workflows/sarscov2_illumina_full.wdl#L105-L111

in an unrelated example, here's a workflow that calls other tasks that emit Map outputs and then looks up values in that Map in the middle of a scatter block in order to know what to do differently with each sample: