Closed aofarrel closed 2 years ago
The checker has revealed a potentially large issue. I misunderstood the implications of sbg_group_segments_1's output structures.
The CWL's return JSON for sbg_group_segments_1 looks like this.
This means that the next task (assoc_combine_r) is scattering on the top level of grouped_assoc_files, ie, assoc_combine_r will be scattered into two tasks if you are running on two chromosomes. assoc_combine_r will return one combined file per chromosome.
This is not how the WDL currently works. assoc_combine_r will instead scatter once per segment, resulting in assoc_combine_r returning one "combined" file per segment.
There seems to be three possible ways to resolve this in the WDL:
Number 1 does not sit well with me as it means previous tasks have the wrong input, but the plotting step does actually seem consistent across Terra and SB as-is, so it could be valid... Number 2 would be ideal, although it didn't work last time... Number 3 would complicate the flow even further...
This is not ready for a release as it lacks a checker workflow, but it's complicated enough it should be at least quickly reviewed in its current form.