NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

checkpoint rules produce different amount of outputs depending on target type #64

Closed percyfal closed 2 years ago

percyfal commented 2 years ago

checkpoint rules behave somewhat erratically and seem to rerun targets when there should be no need. For instance, there are log files where rule Make_Node_List is being rerun, even though the output file already exists. It is triggered by the checkpoint action itself, presumably because the output is a directory. Changing the output to a file (e.g. dir/.done) eliminates this behaviour.

As an example, here is a summary of rules and number of times they should be run from a recent test run on the authentication step where the input to Make_Node_List (output from checkpoint Extract_TaxIDs) is a directory:

Job stats:
job                         count    min threads    max threads
------------------------  -------  -------------  -------------
Authentication_Plots           17              1              1
Breadth_Of_Coverage             3              1              1
Deamination                    17              1              1
Make_Node_List                  5              1              1
Malt_Extract                    5              4              4
PMD_scores                     15              1              1
Post_Processing                33              4              4
Read_Length_Distribution       10              1              1
aggregate                       2              1              1
all                             1              1              1
total                         108              1              4

compared to where the input is a hidden file in that directory:

Job stats:
job                         count    min threads    max threads
------------------------  -------  -------------  -------------
Authentication_Plots           16              1              1
Deamination                    16              1              1
PMD_scores                     14              1              1
Post_Processing                32              4              4
Read_Length_Distribution        9              1              1
aggregate                       2              1              1
all                             1              1              1
total                          90              1              4

In the latter case, Make_Node_List is not triggered to rerun, which is the expected behaviour.

percyfal commented 2 years ago

Fixed via #59