bihealth / seasnap-pipeline

SeA-SnaP: (Se)q (A)nalysis (Sna)kemake (P)ipeline
1 stars 2 forks source link

Move R code away from snakemake files #6

Open eudesbarbosa opened 3 years ago

eudesbarbosa commented 3 years ago

Issue R code is kept in the Snakemake files and directly take variables from it. For example, rule filter in DE_pipeline. Snakemake:

rule filter:
    """ filter experiment & genes before normalisation, based on counts and/or GTF property"""
    input:
        ...
    run:
        config_file = pph.file_path(step="pipeline_report", extension="yaml", contrast="all")
        script = textwrap.dedent(r"""
        #----- import packages
        library(DESeq2)
        library(AnnotationDbi)

        {R_SESSION_INFO}

        conf.f   <- "{config_file}"

        #----- load config
        config <- yaml::yaml.load_file(conf.f)

        #----- read counts in dds format
        dds <- readRDS("{input.rds}")
        sample_df <- as.data.frame(SummarizedExperiment::colData(dds))
        rownames(sample_df) <- as.character(sample_df$label)
        ...

Possible solution Isolate all R code on specific files, the location can be accessed via config, for instance. Structure the code in a way that is possible to pass the Snakemake variables as arguments (silly example). Once the code is isolated it should be easier to stablish unit tests for all of them.