Right now we're including rules in the PON.smk and QC.smk separately from the ones defined in the snakemake_rules dict in rules.py, and the way the rules are included in the SMKs are a bit messy, such as code like this:
if config["analysis"]["analysis_workflow"] == "balsamic":
rules_to_include = [rule for rule in rules_to_include if "umi" not in rule]
And some rules are included even though they are not used, such as dragen_dna.rule.
I think it could be nice to clean this up a bit, and create some function where the rules can be extracted based on the analysis tags in the sample config. Such as the below example as a placeholder.
Suggested solution
class SnakemakeRules:
"""Class to extract relevant rules for provided tags."""
def __init__(self, snakemake_rules_dict: Dict[str, Dict[str, List[str]]]):
self.snakemake_rules_dict = snakemake_rules_dict
def get_rules_by_tags(self, sequencing_type, analysis_type, workflow) -> List[str]:
rules_to_include = []
# Only keep rules where sequencing_type, analysis_type, workflow exists in each section of the include in dict
return rules_to_include
snakemake_rules_dict: Dict = {
"concatenate": {
"path": "snakemake_rules/concatenation/concatenation.rule",
"include_in": {
"sequencing_type": [SequencingType.WGS, SequencingType.TARGETED],
"analysis_type": [AnalysisType.SINGLE],
"workflow": [WorkflowSolution.DRAGEN]
}
},
"fastp": {
"path": "snakemake_rules/quality_control/fastp.rule",
"include_in": {
"sequencing_type": [SequencingType.WGS, SequencingType.TARGETED],
"analysis_type": [AnalysisType.SINGLE, AnalysisType.PAIRED, AnalysisType.PON],
"workflow": [AnalysisWorkflow.BALSAMIC, AnalysisWorkflow.BALSAMIC_QC, AnalysisWorkflow.BALSAMIC_UMI]
}
},
"fastqc": {
"path": "snakemake_rules/quality_control/fastqc.rule",
"include_in": {
"sequencing_type": [SequencingType.WGS, SequencingType.TARGETED],
"analysis_type": [AnalysisType.SINGLE, AnalysisType.PAIRED],
"workflow": [AnalysisWorkflow.BALSAMIC, AnalysisWorkflow.BALSAMIC_QC, AnalysisWorkflow.BALSAMIC_UMI]
This can be closed when:
Describe what needs to be done for this issue to be closed
Blocked by
If there are any blocking issues/prs/things in this or other repos. Please link to them.
Description
Right now we're including rules in the PON.smk and QC.smk separately from the ones defined in the
snakemake_rules
dict inrules.py
, and the way the rules are included in the SMKs are a bit messy, such as code like this:And some rules are included even though they are not used, such as
dragen_dna.rule
.I think it could be nice to clean this up a bit, and create some function where the rules can be extracted based on the analysis tags in the sample config. Such as the below example as a placeholder.
Suggested solution
This can be closed when:
Describe what needs to be done for this issue to be closed
Blocked by
If there are any blocking issues/prs/things in this or other repos. Please link to them.
Before submitting