CDPHE-bioinformatics / CDPHE-SARS-CoV-2

Workflows and scripts for the assembly and analysis of SARS-CoV-2 whole genome tiled amplicon sequencing.
https://cdphe-bioinformatics.github.io/CDPHE-SARS-CoV-2/
GNU General Public License v3.0
5 stars 0 forks source link

[REQUEST] Right size obvious tasks to alleviate resource exhaustion from CPU quota #10

Open danpolanco opened 8 months ago

danpolanco commented 8 months ago

Feature Request

We are reaching our CPU limit for our production Terra workspace. This is negatively impacting our pipeline and preventing us from reliably meeting dashboard updates.

This issue is concerned primarily with the create_version_capture_file task:

https://github.com/CDPHE-bioinformatics/CDPHE-SARS-CoV-2/blob/c988ea7924bdd323adbd099ad38fb7ca60d18d5d/workflows/SC2_illumina_pe_assembly.wdl#L597

https://github.com/CDPHE-bioinformatics/CDPHE-SARS-CoV-2/blob/c988ea7924bdd323adbd099ad38fb7ca60d18d5d/workflows/SC2_ont_assembly.wdl#L537

Solution

To fix this we should review our container runtime settings in our WDLs. There are a few places where they need to be fixed:

Other possible changes

We should also consider reducing the transfer task to use fewer CPU. While this will increase the transfer task time, it'll free up CPUs so more tasks can be spawned at once. The trade off makes sense to me since spawning other tasks that have long runtimes is better than trying to get through file transfers that don't take that long to begin with.

https://github.com/CDPHE-bioinformatics/CDPHE-SARS-CoV-2/blob/c988ea7924bdd323adbd099ad38fb7ca60d18d5d/workflows/SC2_illumina_pe_assembly.wdl#L667

https://github.com/CDPHE-bioinformatics/CDPHE-SARS-CoV-2/blob/c988ea7924bdd323adbd099ad38fb7ca60d18d5d/workflows/SC2_ont_assembly.wdl#L592

Also, if a WDL command block runs serially, we might not see a benefit to multiple CPUs in the transfer tasks anyway.

Downstream effects

All other tasks should be examined as well, but those should be their own issues since we can accomplish a bit with this low hanging fruit.

danpolanco commented 1 week ago

We can use dynamic allocation for memory and storage.

See high memory branch of SC2 as example.