WayScience / CytoSnake

Orchestrating high-dimensional cell morphology data processing pipelines
https://cytosnake.readthedocs.io
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

Allow whole config loading to scripts instead of partial loading. #81

Open axiomcura opened 1 year ago

axiomcura commented 1 year ago

This issue shows that problems of partial loading configuration into scripts. Partial loading requires more variables to be created to separately load configuration into these script

# aggregate module
rule aggregate:
    input:
        sql_files=get_data_path(
            input_type=config["aggregate_configs"]["params"]["input_data"],
            use_converted=DATA_CONFIGS["use_converted_plate_data"],
        ),
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        aggregate_profile=get_data_path(input_type="aggregated"),
        cell_counts=get_data_path(input_type="cell_counts"),
    log:
        "logs/aggregate_{basename}.log",
    conda:
        "../envs/cytominer_env.yaml"
    params:
        single_cell_config=["single_cell_configs"],
        aggregate_config=config["aggregate_configs"],
    script:
        "../scripts/aggregate_cells.py"

To access the configs with partial loading, one must separately declare variables

# aggregate script 
 aggregate_output = str(snakemake.output["aggregate_profile"])
 single_cells_config=snakemake.params["single_cell_config"]

This issue becomes if more complex scripts requires more configurations to be loaded. It will not only increase the number of variables within the module but it will also increase the number of variables in the scripts therefore making it difficult to understand how the configs are being used.

This issue will be part of #41

axiomcura commented 1 year ago

Update

It seems that some module config paths are used as inputs requiring scripts to read and parse configuration files. Snakemake already does this inherently, therefore it is redundant to parse yaml files inside scripts: Here's and example:

annotate.smk

rule annotate:
    """
    Generates an annotated profile with given metadata and is stored instored
    in the `results/` directory.

    Utilizes pycytominer's annotate module:
    https://github.com/cytomining/pycytominer/blob/master/pycytominer/annotate.py

    :input profiles: single-cell or aggregate profiles.
    :input barcode: file containing unique barcodes that maps to a specific plate.
    :input metadata: metadata file associated with single-cell morphology dataset.

    :config: workflow config pointing to annotate configs.

    :output annotated: annotated profile.
    """
    input:
        profile=get_data_path(
            input_type=config["annotate_configs"]["params"]["input_data"],
            use_converted=DATA_CONFIGS["use_converted_plate_data"],
        ),
        barcodes=BARCODES,
        metadata=METADATA_DIR,
    output:
        get_data_path(input_type="annotated"),
    conda:
        "../envs/cytominer_env.yaml"
    log:
        "logs/annotate_{basename}.log",
    params:
        annotate_config=config["config_paths"]["annotate"],
    script:
        "../scripts/__annotate.py

annotate.py

# loading in annotate configs
    logging.info(f"Loading Annotation configuration from: {config}")

    annotate_path_obj = pathlib.Path(config)
    if not annotate_path_obj.is_file():
        e_msg = "Unable to find Annotation configuration file"
        logging.error(e_msg)
        raise FileNotFoundError(e_msg)

    annotate_config_path = annotate_path_obj.absolute()
    with open(annotate_config_path, "r") as yaml_contents:
        annotate_configs = yaml.safe_load(yaml_contents)["annotate_configs"]["params"]
        logging.info("Annotation configuration loaded")

This code can easily be removed by placing the config path at the workflow level by using the configfile parameter:

# cp_process.smk workflow
configfile: path/to/general_config.yaml
configfile: path/to/workflow_config.yaml

This will remove the redundant code that exists within all scripts, making it much easier to read