aryarm / varCA

Use an ensemble of variant callers to call variants from ATAC-seq data
MIT License
23 stars 8 forks source link

only install the necessary variant caller dependencies #19

Open aryarm opened 3 years ago

aryarm commented 3 years ago

Our pipeline requires that users install every variant caller at runtime, even if they don't actually use some of them. For example, DELLY is not used by the pipeline by default, but it is still installed by Snakemake when it is executed for the first time.

Is there a way to improve this behavior so that only the required dependencies are installed at runtime? Currently, the answer is no.

Why? Well, there are only two steps in the Snakemake pipeline that execute the variant callers in the ensemble: the prepare_caller rule and the run_caller rule. Both steps must be general enough that they would work for any variant caller. The inputs and outputs of those rules dynamically adapt to every caller based on a single wildcard. If we wanted to have the dependencies of the rule change too, we would need to change the env rule based on the caller wildcard. But snakemake currently offers no way of doing this; you can't provide a lambda function to env like you can for input, output, and params.

I really only see one solution to this issue, then: I submit a pull request (or feature request) for snakemake that adds the functionality we desire. I can't really think of anything else short of some sort of major refactor?

aryarm commented 3 years ago

Ok, well apparently this is now possible in Snakemake v6, allowing us to do option 2 in #30! See the Rule Inheritance section of the Snakemake documentation, specifically this part:

use rule a as b with:
    output:
        "test2.out"

Presumably, we could wrap this in a for-loop and use it to just change the conda directive. So we could do something like this:

for caller in callers:
    use rule run_caller as "run_"+caller with:
        conda: f"envs/{caller}.yml"

Barring any unforeseen challenges, I should be able to resolve this issue in a few weeks. It might still require quite a bit of code restructuring and testing.