Closed aofarrel closed 2 years ago
Group should be grouping per chromosome, that is its purpose. Try using read_tsv() method instead of read_lines().
If read_tsv() fails, we could try avoiding the issue by merging this with the subsequent task: https://github.com/DataBiosphere/analysis_pipeline_WDL/tree/assoc-agg-debugging-merge-tasks
This implements option 2 from here: https://github.com/DataBiosphere/analysis_pipeline_WDL/pull/57#issuecomment-951353842
Not only does this bring this closer to the CWL, it also improves fixes the bug mentioned in the comment above. Most of the RData final outputs are still not passing against the truth files, but they are at least the correct number of files now.
One problem this does has is that it must localize all files from all segments in each chromosome. This might become problematic as the number of chromosomes increase. I'm not sure of a way around it...
Another problem: https://github.com/DataBiosphere/analysis_pipeline_WDL/issues/59