FredHutch / SEACR

SEACR: Sparse Enrichment Analysis for CUT&RUN
GNU General Public License v2.0
104 stars 45 forks source link

SEACR missing lots of obvious peaks #87

Open Lillianwu0314 opened 2 years ago

Lillianwu0314 commented 2 years ago

Hi all,

IGV H3K27me3 HOXC cluster Here's an IGV screenshot of the HOXC gene cluster. Blue: H3K27me3 (top) and IgG (bottom) bedgraphs Green: Peaks called with MACS2 using different p and q-value cut-offs (I used my bam files as input and -f BAMPE) Red: Peaks called with SEACR on relaxed (top) and stringent (bottom) mode

What could possibly be causing this? I'm double checking all my scripts in case I've made a silly mistake.

Thanks for taking the time to read this. Lillian

Lillianwu0314 commented 2 years ago

Hi all,

Adding to this, I also did the same analysis with reads aligned to the hg38 assembly. The peaks called with SEACR make a lot more sense. Here's a screenshot of the HOXC gene cluster with peaks called by SEACR relaxed (top) and stringent (bottom) mode. IGV H3K27me3 HOXC cluster hg38

Thanks, Lillian

mpmeers commented 2 years ago

Hello,

It's possible that the T2T assembly is going to cause there to be some differences in the global background estimation, since large pileups at difficult-to-map repeated regions might be differently represented now that there are presumably many more available reference sites at which ambiguous reads could map. My understanding is that the bowtie2 default (in contrast with -k or -a mode) is to assign the read semi-randomly to a single map location when there are multiple possible map locations with the same quality score (described in more detail here). Are you using bowtie2 here, and if so is it doing the default map position reporting? Does removing duplicates make a difference? This is something I will probably need to take a closer look at since it could be a generalized behavior with the T2T builds, which are unlike anything we've used before.

Mike

Lillianwu0314 commented 2 years ago

Hi Mike,

Thanks for the quick reply and explanation. I did try removing duplicates from the IgG (duplication rate ~50%) but not H3K27me3 (duplication 13%) and then running SEACR with that. It doesn't seem to improve peak calling unfortunately.

image

I did use Bowtie2 with the default setting. I'll have a look into -k and -a and see if that's appropriate for what I'm looking into. I've also considered using --non-deterministic so that identical reads don't get aligned to the same places if there are multiple equally good alignments.

Thanks, Lillian

clabanillas commented 6 months ago

Hello,

I've been having the same issue for a while now. I also run on T2T aligned reads. Many peaks are missed and peak calling is not uniform between replicates: image

Has there been an update on this? Thanks a lot, Claudia