DataBiosphere / analysis_pipeline_WDL

Collection of WDL workflows based off the University of Washington TOPMed DCC Best Practices for GWAS. The WDL structure was based upon CWLs written by the Seven Bridges development team.
6 stars 3 forks source link

Assoc Aggregate: Now With Good Scattering! #58

Closed aofarrel closed 2 years ago

aofarrel commented 2 years ago

This implements option 2 from here: https://github.com/DataBiosphere/analysis_pipeline_WDL/pull/57#issuecomment-951353842

Not only does this bring this closer to the CWL, it also improves fixes the bug mentioned in the comment above. Most of the RData final outputs are still not passing against the truth files, but they are at least the correct number of files now.

One problem this does has is that it must localize all files from all segments in each chromosome. This might become problematic as the number of chromosomes increase. I'm not sure of a way around it...

Another problem: https://github.com/DataBiosphere/analysis_pipeline_WDL/issues/59

aofarrel commented 2 years ago

Group should be grouping per chromosome, that is its purpose. Try using read_tsv() method instead of read_lines().

aofarrel commented 2 years ago

If read_tsv() fails, we could try avoiding the issue by merging this with the subsequent task: https://github.com/DataBiosphere/analysis_pipeline_WDL/tree/assoc-agg-debugging-merge-tasks