epi2me-labs / wf-artic

ARTIC SARS-CoV-2 workflow and reporting
https://labs.epi2me.io/
Other
49 stars 36 forks source link

[Bug]: publish output process refactoring recommended #74

Closed phyden closed 1 year ago

phyden commented 1 year ago

What happened?

Hi,

we are using your workflow for a while now. Previously we ran it on a single server using the conda environment, but now we moved to a SLURM scheduled multi-server high performance compute system and docker (podman) images. However, running that many "output" processes (394 for a 96-sample sequencingrun) which initialize a container, queue in a workload management system and basically only echo some text is not efficient for our system.

So we refactored your pattern of "publish files in a separate publish step" to:

output(results.collect())
process output {
    // publish inputs to output directory
    // note that there is no label, which causes the code to be run outside any container
    publishDir "${params.out_dir}", mode: 'copy', pattern: "*"
    input:
        file ("*") 
    output:
        path ("*"), includeInputs: true
    """
    echo "Writing output files"
    """
}

Although this might be only a specific issue for us though, I believe that this pattern would also not "hurt" in any other setup for the pipeline.

br Patrick

Operating System

RHEL 8

Workflow Execution

Commandline

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - CLI Execution Profile

Docker (Podman)

Workflow Version

v0.3.18

Relevant log output

executor >  slurm (204)
[fe/cc2371] process > checkSampleSheet         [100%] 1 of 1 ✔
[9a/5ec469] process > pipeline:getVersions     [100%] 1 of 1 ✔
[64/fd3e52] process > pipeline:getParams       [100%] 1 of 1 ✔
[cf/4ea9ea] process > pipeline:copySchemeDir   [100%] 1 of 1 ✔
[64/8233ec] process > pipeline:preArticQC (93) [100%] 96 of 96 ✔
[a1/03775c] process > pipeline:runArtic (94)   [100%] 96 of 96 ✔
[34/f61c73] process > pipeline:combineDepth    [100%] 1 of 1 ✔
[02/207906] process > pipeline:allConsensus    [100%] 1 of 1 ✔
[9b/a7f3cd] process > pipeline:allVariants     [100%] 1 of 1 ✔
[4b/9b5f8e] process > pipeline:prep_nextclade  [100%] 1 of 1 ✔
[2b/03d905] process > pipeline:nextclade       [100%] 1 of 1 ✔
[2f/badb51] process > pipeline:pangolin        [100%] 1 of 1 ✔
[bf/ce1246] process > pipeline:report          [100%] 1 of 1 ✔
[32/a06a31] process > output                   [100%] 1 of 1 ✔
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Completed at: 02-Feb-2023 12:55:11
Duration    : 10m
CPU hours   : 4.8
Succeeded   : 204

PS: right, we also removed your telemetry process from our local version as we are never using the --report_detailed flag and can omit this time limiting bottleneck step.
mattdmem commented 1 year ago

Thanks for this @phyden and so sorry for the very late reply. The simple reason being we don't really love our current implementation either! We're actively investigating ways to improve.