Open axiomcura opened 1 year ago
It seems that some module config paths are used as inputs requiring scripts to read and parse configuration files. Snakemake already does this inherently, therefore it is redundant to parse yaml files inside scripts: Here's and example:
annotate.smk
rule annotate:
"""
Generates an annotated profile with given metadata and is stored instored
in the `results/` directory.
Utilizes pycytominer's annotate module:
https://github.com/cytomining/pycytominer/blob/master/pycytominer/annotate.py
:input profiles: single-cell or aggregate profiles.
:input barcode: file containing unique barcodes that maps to a specific plate.
:input metadata: metadata file associated with single-cell morphology dataset.
:config: workflow config pointing to annotate configs.
:output annotated: annotated profile.
"""
input:
profile=get_data_path(
input_type=config["annotate_configs"]["params"]["input_data"],
use_converted=DATA_CONFIGS["use_converted_plate_data"],
),
barcodes=BARCODES,
metadata=METADATA_DIR,
output:
get_data_path(input_type="annotated"),
conda:
"../envs/cytominer_env.yaml"
log:
"logs/annotate_{basename}.log",
params:
annotate_config=config["config_paths"]["annotate"],
script:
"../scripts/__annotate.py
annotate.py
# loading in annotate configs
logging.info(f"Loading Annotation configuration from: {config}")
annotate_path_obj = pathlib.Path(config)
if not annotate_path_obj.is_file():
e_msg = "Unable to find Annotation configuration file"
logging.error(e_msg)
raise FileNotFoundError(e_msg)
annotate_config_path = annotate_path_obj.absolute()
with open(annotate_config_path, "r") as yaml_contents:
annotate_configs = yaml.safe_load(yaml_contents)["annotate_configs"]["params"]
logging.info("Annotation configuration loaded")
This code can easily be removed by placing the config path at the workflow level by using the configfile
parameter:
# cp_process.smk workflow
configfile: path/to/general_config.yaml
configfile: path/to/workflow_config.yaml
This will remove the redundant code that exists within all scripts, making it much easier to read
This issue shows that problems of partial loading configuration into scripts. Partial loading requires more variables to be created to separately load configuration into these script
To access the configs with partial loading, one must separately declare variables
This issue becomes if more complex scripts requires more configurations to be loaded. It will not only increase the number of variables within the
module
but it will also increase the number of variables in thescripts
therefore making it difficult to understand how the configs are being used.This issue will be part of #41