big replicates discrepancy for peaking calling when using negative control

icanwinwyz commented 2 years ago

Dear SEACR developers,

Thanks for developing such a great tool for Cut & Tag data!

My experiment has two replicates for histone K27me3 with no antibody as the negative control. I used the command shown below for peak calling

SEACR_1.3.sh K27me3_Rep1.bedgraph K27me3_neg.bedgraph norm relaxed K27me3

However, the peak number showed huge difference between two replicates - Rep1 yielded 8376 peaks and Rep2 yielded 742055 peaks. when I turned off the "norm" with the command line shown below:

SEACR_1.3.sh K27me3_Rep1.bedgraph K27me3_neg.bedgraph non relaxed K27me3

Rep1 yielded 8198 peaks and Rep2 yielded 8311 peaks. This doesn't make sense to me since (1) I expected a peak number more than 50K based on a pilot study (2) the lib size for Rep1 is 10.5M, Rep2 is 12.8M and negative control is only 150K, how negative control with such low sequencing depth could normalize out so many reads from the target samples?

Then I ran the command using the empirical cutoff (0.00001) without negative control:

SEACR_1.3.sh K27me3_Rep1.bedgraph 0.00001 non relaxed K27me3

I got decent peak numbers for Rep1, 133849, and Rep2, 154101.

So my question is, why the negative control causes such a big discrepancy between replicates when using norm relaxed mode? And why the negative control with such low sequencing depth cause so many peaks to fail to be called when using non relaxed mode?

Thanks for your time and please feel free to let me know if you need further information. Thanks.

clabanillas commented 5 months ago

Hi Joe,

Have you received an update on this issue? I have the same problem.

dhurjhotisaha commented 4 months ago

Hi Joe, I am facing similar issues with my CUT&RUN samples. Kindly share if you have any update.

FredHutch / SEACR

big replicates discrepancy for peaking calling when using negative control #89