For some reason the get_chembl_compound_target rule runs every time the pipeline is triggered via the all rule (i.e., snakemake is called without a rule argument).
This is super-inefficient, as it will often do API query look-ups on all 50k compounds in our database before executing any of the other code. Often when triggering the all rule, I simply want to start a DB write after correcting a bug in the pipeline.
We need to determine some modularization method that prevents this rule from re: running if it has been called recently.
So far, touching files in the output parameter seems to be a useful way to stop weird rule execution order behaviours. E.g.:
For some reason the
get_chembl_compound_target
rule runs every time the pipeline is triggered via theall
rule (i.e., snakemake is called without a rule argument).This is super-inefficient, as it will often do API query look-ups on all 50k compounds in our database before executing any of the other code. Often when triggering the
all
rule, I simply want to start a DB write after correcting a bug in the pipeline.We need to determine some modularization method that prevents this rule from re: running if it has been called recently.
So far, touching files in the output parameter seems to be a useful way to stop weird rule execution order behaviours. E.g.:
More elegant solutions may exist.