NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
21 stars 14 forks source link

Breadth of coverage needs to depend on MaltExtract output being generated #170

Open ZoePochon opened 5 months ago

ZoePochon commented 5 months ago

For the rule breadth of coverage to work, MaltExtract first has to run in order to generate the results/AUTHENTICATION/{wildcards.sample}/{wildcards.taxid}/MaltExtract_output/default/readDist/{wildcards.sample}.trimmed.rma6_additionalNodeEntries.txt file. This file is then used by the function get_ref_id in the common.smk to extract the best reference genome id, that is afterwards used by the rule breadth of coverage for the alignment.

Concrete example: When the rule breadth of coverage begins by chance before the MaltExtract rule had time to finish, one gets this weird output:

[Mon Jun 24 14:32:30 2024] Job 0: Breadth_Of_Coverage: COMPUTING BREADTH OF COVERAGE, EXTRACTING REFERENCE SEQUENCE FOR VISUALIZING ALIGNMENTS WITH IGV Reason: Forced execution

echo None > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt; zgrep None results/MALT/ldo252-b1e1l1p1.trimmed.sam.gz | uniq > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam; samtools view -bS results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.bam; samtools sort results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.bam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam; samtools index results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam; samtools depth -a results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/breadth_of_coverage; grep -w -f results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna.fai | awk '{printf("%s:1-%s\n", $1, $2)}' > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt.regions; samtools faidx /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna -r results/AUTHENTICATION/ldo252-b1e1l1p1/44449/namelist.txt.regions -o results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.fasta Activating conda environment: .snakemake/conda/cf2e80308fd9937b2d095483bd836fcd [Mon Jun 24 14:38:37 2024] Error in rule Breadth_Of_Coverage: jobid: 0 input: results/MALT/ldo252-b1e1l1p1.trimmed.sam.gz, /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna, /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna.fai output: results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt, results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam, results/AUTHENTICATION/ldo252-b1e1l1p1/44449/breadth_of_coverage, results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam log: logs/BREADTH_OF_COVERAGE/ldo252-b1e1l1p1_44449.log (check log file(s) for error details) conda-env: /crex/proj/archaeogenetics/nobackup/private/pochonz/Gobas/aMeta/aMeta_supplementary_runs/aMeta_Borreliatrial/.snakemake/conda/cf2e80308fd9937b2d095483bd836fcd shell: echo None > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt; zgrep None results/MALT/ldo252-b1e1l1p1.trimmed.sam.gz | uniq > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam; samtools view -bS results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.bam; samtools sort results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.bam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam; samtools index results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam; samtools depth -a results/AUTHENTICATION/ldo252-b1e1l1p1/44449/sorted.bam > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/breadth_of_coverage; grep -w -f results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna.fai | awk '{printf("%s:1-%s\n", $1, $2)}' > results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt.regions; samtools faidx /proj/archaeogenetics/private/NBIS_Demo/DBDIR_KrakenUniq_Full_NT/library/nt/library.fna -r results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt.regions -o results/AUTHENTICATION/ldo252-b1e1l1p1/44449/None.fasta (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job Breadth_Of_Coverage since they might be corrupted: results/AUTHENTICATION/ldo252-b1e1l1p1/44449/name_list.txt, results/AUTHENTICATION/ldo252-b1e1l1p1/44449/44449.sam Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

An easy fix would be to add results/AUTHENTICATION/{wildcards.sample}/{wildcards.taxid}/MaltExtract_output/default/readDist/{wildcards.sample}.trimmed.rma6_additionalNodeEntries.txt among the input files of the Breadth_of_coverage rule.

LeandroRitter commented 4 months ago

Yes @ZoePochon, I agree, I believe you are right, I will add this fix to the next PR I am preparing

ZoePochon commented 4 months ago

Great! Thanks @LeandroRitter