NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 15 forks source link

cannot extract refid #75

Closed LeandroRitter closed 1 year ago

LeandroRitter commented 1 year ago

When executing runtest.sh of the workflow from the .test directory, I got two warnings:

No such file results/AUTHENTICATION/bar/632/bar.trimmed.rma6_MaltExtract_output/default/readDist/bar.trimmed.rma6_additionalNodeEntries.txt; cannot extract refid

No such file results/AUTHENTICATION/foo/632/foo.trimmed.rma6_MaltExtract_output/default/readDist/foo.trimmed.rma6_additionalNodeEntries.txt; cannot extract refid

they are not critical but still would be good to understand why. The thing is that the "*additionalNodeEntries.txt" files actually exist. So I am not sure why this warning. Otherwise all the rest works without problems.

percyfal commented 1 year ago

So the function get_ref_id is called from the aggregation utility function _aggregate_utils early on in the workflow to generate target file names. Because the function is also called from a checkpoint rule, it will be rerun once the necessary input files actually do exist. So, on the first iteration, the file you point to above actually doesn't exist which is why the warning is thrown. Maybe we should simply change the warning into a debug message as I can't see how this could happen at a later stage when the file actually exists!

To actually see this, modify https://github.com/NBISweden/ancient-microbiome-smk/blob/main/workflow/rules/common.smk#L202 to:

    else:
        sys.exit()
    return res

NB: you need to remove all results before rerunning the workflow to see that the file actually doesn't exist when the warning is shown.

ZoePochon commented 1 year ago

When you run the pipeline on many samples, this warning takes up most of the space in the messaging.

percyfal commented 1 year ago

Yes, I can imagine. I think changing the message level is the right way to go until we figure out a way how to deduce the calling context.