broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

Mutect2 WDL: Funcotate task has useless variables - no way to increase memory for Funcotate task only #7532

Open jason-cerrato opened 2 years ago

jason-cerrato commented 2 years ago

Bug/Usability Report

Affected tool(s) or class(es)

Mutect2 WDL

Affected version(s)

Description

The Mutect2 WDL's Funcotate task has an unintuitive setup with regard to setting memory for the Funcotate task. Funcotate task memory is defined here image

This is using the dictionary defined earlier called standard_runtime.

image

This dictionary uses a variable called machine_mem which is calculated using the workflow's small_task_mem input, which is configurable.

image image

To allocate more memory for the Funcotate task, one has to define this small_task_mem variable at the workflow level. This effectively changes the amount of memory for all tasks that make use of this dictionary, rather than just the Funcotate task.

Funcotate has two input variables default_ram_mb and default_disk_space_gb which have no bearing on the memory and disk space configuration for the task. image

This leads to user confusion when they see these variables in the method configuration page, put values in, and don't see their Funcotate task use the specified values. image

Steps to reproduce

Define the input variables default_ram_mb and default_disk_space_gb for a run of the Mutect2 workflow to be different from the amounts defined by small_task_mem and disk_space

Expected behavior

Defining the input variables default_ram_mb and default_disk_space_gb allows you to specify your preferred memory and disk space configuration for the Funcotate task.

Actual behavior

These variables do not define the runtime configuration for the task. Memory is defined by a workflow-level input that isn't clearly connected to Funcotate.

Suggestion

Utilize the variables default_ram_mb and default_disk_space_gb that already exist in the task in such a way that modifying them has an impact on the configuration of the task VM.

droazen commented 2 years ago

@davidbenjamin @fleharty

fleharty commented 2 years ago

@davidbenjamin Can you take this? I'm not going to have any time to work on this.

phylyc commented 2 years ago

I cleaned up the mutect2 wdl and added multi-sample support. I also optimized resource usage and exposed the memory parameters: https://github.com/phylyc/gatk4-somatic-snvs-indels