NBISweden / GenErode

GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.
GNU General Public License v3.0
21 stars 7 forks source link

rule 7_mlrho dependant on final 3.1 bam.bai temporary file #48

Closed lored322 closed 5 months ago

lored322 commented 1 year ago

Rule 7_mlRho looks for .bam.bai files as input even though it only uses the .bam files. However in both 3.1 and 3.3 bam processing steps, these bam.bai files are marked as temporary and deleted at end of the pipeline run. Thus, if any files are missing from the expected output of 3.1 (i.e. sorted bams), then the pipeline will remap all affected samples from the beginning.

This is different to the 4_genotyping rule, which is only dependant of the final bam file from 3.1/3.2/3.3, and thus the absence of the bam.bai file will not trigger a remapping of any samples.

One option would be to make the bam.bai files not temporary, or otherwise remove the code calling them in 7_mlRho.smk (which works).

verku commented 1 year ago

Thank you for reporting this issue! We will include one of the solutions in the next pipeline version.

verku commented 1 year ago

If you can't wait until then, you can solve this by removing the temp() flag yourself from the following lines of code:

https://github.com/NBISweden/GenErode/blob/4fc1faad59e0020f915d87e2a8c4e4ff25aa9d35/workflow/rules/3.1_bam_rmdup_realign_indels.smk#L549

This line turns into: index="results/{dataset}/mapping/" + REF_NAME + "/{sample}.merged.rmdup.merged.realn.bam.bai",

https://github.com/NBISweden/GenErode/blob/4fc1faad59e0020f915d87e2a8c4e4ff25aa9d35/workflow/rules/3.3_bam_subsampling.smk#L65

And this line turns into: index="results/{dataset}/mapping/" + REF_NAME + "/{sample}.merged.rmdup.merged.{processed}.mapped_q30.subs_dp{DP}.bam.bai",