bhklab / PharmacoDI_snakemake_pipeline

A Snakemake pipeline to automate the scripts for creating all PharmacoDB database tables and allow easy deployment on a range of platforms.
MIT License
1 stars 0 forks source link

`get_chembl_compound_targets` rule runs as part of `all` rule #6

Open ChristopherEeles opened 3 years ago

ChristopherEeles commented 3 years ago

For some reason the get_chembl_compound_target rule runs every time the pipeline is triggered via the all rule (i.e., snakemake is called without a rule argument).

This is super-inefficient, as it will often do API query look-ups on all 50k compounds in our database before executing any of the other code. Often when triggering the all rule, I simply want to start a DB write after correcting a bug in the pipeline.

We need to determine some modularization method that prevents this rule from re: running if it has been called recently.

So far, touching files in the output parameter seems to be a useful way to stop weird rule execution order behaviours. E.g.:

rule my_rule:
    input:
        'my_file.csv'
    output:
         touch('my_rule_ran.done')

More elegant solutions may exist.