FredHutch / SEACR

SEACR: Sparse Enrichment Analysis for CUT&RUN
GNU General Public License v2.0
100 stars 43 forks source link

Bedgraph Creation from BAM #39

Open dstrib opened 4 years ago

dstrib commented 4 years ago

Hi! In your README.me you suggest using a series of four steps to create a bedgraph file for input to SEACR.

bedtools bamtobed -bedpe -i $sample.bam > $sample.bed
awk '$1==$4 && $6-$2 < 1000 {print $0}' $sample.bed > $sample.clean.bed
cut -f 1,2,6 $sample.clean.bed | sort -k1,1 -k2,2n -k3,3n > $sample.fragments.bed
bedtools genomecov -bg -i $sample.fragments.bed -g my.genome > $sample.fragments.bedgraph

Does the single bedtools command not result in a reflection of pairs, as you specify?

bedtools genomecov -bg -ibam $sample.bam > $sample.fragments.bedgraph

Thanks!

mpmeers commented 4 years ago

Hi,

I tried testing this briefly and it appears that your method does not work unless the -pc flag is also included (as described in Issue #18), since otherwise it only maps reads rather than reads plus the intervening fragments between read pairs. Therefore, the proper code would be as follows:

bedtools genomecov -bg -pc -ibam $sample.bam > $sample.fragments.bedgraph

This should work as long as the bam file is "clean" in the sense that there are no non-coherent read pairs (e.g. pairs mapping to different chromosomes or disparate loci on the same chromosome) in the bam file. The contribution of non-coherent read pairs to the bedgraph will cause SEACR to fail, and It's for that reason that I typically advocate for using the series of processing steps in the README. However, it's likely you could achieve the same results using filtering based on quality flags in samtools (e.g. samtools view -q to filter based on MAPQ or -f/-F to filter on read pair coherence). Hope that helps.

Mike