cbirdlab / dDocentHPC

hard fork of dDocent, edited to run without interactive user input
2 stars 5 forks source link

Adding PCR and optical duplicate removal to pipeline #10

Open mkent001 opened 4 years ago

mkent001 commented 4 years ago

Optical duplicates are duplicate reads that are created when a single amplification cluster is identified as multiple clusters by the optical sensor of the sequencing equipment. We have been using Samtools markdup to identify and remove both optical and PCR duplicates. I have been running these scripts as independent loops so far. The second to last one is the final step in removing the duplicates. The last step creates new index bai files, which may not be necessary in the pipeline.

collate.sh

!/bin/bash

enable_lmod module load container_env ddocent

for i in RG.bam;do crun samtools collate -o $i.namecollate $i ;done | mv .namecollate.bam collate/

fixmate.sh

!/bin/bash

enable_lmod module load container_env ddocent

for i in namecollate.bam;do crun samtools fixmate -m $i $i.fixmate.bam ;done | mv .fixmate.bam fixmate/

sort.sh

!/bin/bash

enable_lmod module load container_env ddocent

for i in *fixmate.bam;do crun samtools sort -o $i.positionsort.bam $i ;done

markdups.sh

!/bin/bash

enable_lmod

module load container_env ddocent

for i in *positionsort.bam;do crun samtools markdup -r -d 100 -l 850 $i $i.markdup.bam ;done

index.sh

!/bin/bash

enable_lmod

module load container_env ddocent

for i in *markdup.bam;do crun samtools index -b $i $i.bai ;done

This is the example order that samtools provides: samtools collate -o namecollate.bam input.bam samtools fixmate -m namecollate.bam fixmate.bam samtools sort -o positionsort.bam fixmate.bam samtools markdup positionsort.bam markdup.bam