Open leafyeszjl opened 4 years ago
Hi,
To what read depth were the two IgG control samples sequenced? The fact that the S42 experimental sample yields very few peaks suggests there may be an issue with the S42 sample itself, and I have found that there can be issues with SEACR when the control sample read depth is much lower than the target sample read depth, for instance.
Mike
Hi Mike,
Thank you very much. The reads pair in the S42 sample was indeed very low. Do you know what read depth or read pairs are appropriate for peak calling ?
Sample | S11 | S38 | S39 | S42 | S5 | S6 |
---|---|---|---|---|---|---|
Clean reads pair | 2521993 | 197705953 | 7745098 | 4360 | 8416998 | 6707812 |
Mapped reads pair | 2367278 | 184952503 | 7042634 | 3995 | 8032654 | 6505496 |
Mapping rate(%) | 93.87 | 93.55 | 90.93 | 91.63 | 95.43 | 96.98 |
Duplicate reads | 929968 | 244465240 | 2043174 | 292 | 1469808 | 1289332 |
Duplicate rate(%) | 19.64 | 66.09 | 14.51 | 3.65 | 9.15 | 9.91 |
Jialin
Hi Mike, Do you think it is necessary to cut the same bases or reads to call peak for sample comparing? Jialin
Hi Mike, I am sorry for troubling you so many times. I have changed my CUT&Tag data for this analysis. In this experiment, L9 was the IgG sample, and L1\L2\L3\L4 sample were just different in cells input (100,1k,1w and 10 w respectively). Other experimental conditions are the same. The alignment results was as table 1 , the peaks number calling by SEACR was as table 2 and the TSS heatmap was showing as Fig1. I am surprised that peaks number of L3 was so high. Why? Thank you for your reply. (1)Table 1: Alignment results
Sample | L1 | L2 | L3 | L4 | L9 |
---|---|---|---|---|---|
Clean reads pair | 23352501 | 14016997 | 8588937 | 12006095 | 7104387 |
Mapped reads pair | 21165340 | 12074669 | 7180872 | 11589712 | 6476417 |
Mapping rate(%) | 90.63 | 86.14 | 83.61 | 96.53 | 91.16 |
Duplicate reads | 40234544 | 22294654 | 10839854 | 6596992 | 11982356 |
Duplicate rate(%) | 95.05 | 92.32 | 75.48 | 28.46 | 92.51 |
chrMT reads | 570956 | 401178 | 397724 | 431596 | 1452286 |
chrMT rate(%) | 1.35 | 1.66 | 2.77 | 1.86 | 11.21 |
(2)Table 2: Peaks number calling by SEACR, with the command
SEACR_1.3.sh L4_fragments.bedgraph L9_fragments.bedgraph norm stringent
Sample |
peaks_number |
---|---|
L1 | 4167 |
L2 | 4255 |
L3 | 255226 |
L4 | 1698 |
(3) Fig1:TSS heatmap
jialin
Hi Jialin,
It's hard for me to know exactly why it may have called so many peaks without having a look at the data myself, but in general SEACR gives the best results when the IgG is from the same conditions as the target experiment (same cell type, cell number, similar read depth, etc.), since the IgG is meant to provide an estimate of the natural background under the same conditions in which the experiment was conducted. This is relevant to your first example with S42, since that sample (and I presume the S42 IgG you used) was dramatically undersequenced. As for the second example, the extremely high duplicate rate across most samples makes it difficult to know whether your issue is with peak calling or with underlying data quality. I'm a little confused by your numbers too since you're reporting duplicate reads, but the number of duplicate paired end fragments (i.e. where both pairs of reads are identical) is the information that's more relevant to SEACR's function. It may be worth trying to filter out duplicates fragments (not duplicate individual reads) and rerunning it, but it's tough for me to recommend anything else without seeing a sample of the data.
Mike
(2) Table 2:Macs2 test results