Closed FionaMoon closed 10 months ago
Hello,Typically higher quality replicates will have more tested windows. The tested windows are optimized to yield the most enriched windows per replicate given multiple hypothesis test correction. If you want to be more permissive with your hit regions you could inspect windows that are detected by individual replicates.Did you have any other questions?EvanOn Dec 25, 2023, at 1:05 AM, LY @.***> wrote: Hi, Thank you for developing such a fantastic tool! Skipper has been incredibly helpful for my research in RBPs. I've successfully run Skipper on my eCLIP data which has 3 replicates and I still have some questions about the tested windows of skipper. Across all experimental groups, Rep1 shows the highest number of "Tested windows", even when its read counts (both eCLIP and input) fluctuate. I don't understand why this happens. I wonder whether the lower "Tested windows" in Rep2 and Rep3 will significantly affect peak enrichment. cat 1_enrichment_reproducibility.tsv
1 Enriched 1303 1 Not enriched 76178 2 Enriched 340 2 Not enriched 24426 3 Enriched 279 3 Not enriched 16452 cat 2_enrichment_reproducibility.tsv
1 Enriched 62941 1 Not enriched 81729 2 Enriched 61459 2 Not enriched 38334
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Evan,
I'm confused about the discrepancy in the output of Enrichment_reproducibility which I hope you can help clarify.
The number of tested_windows
and enriched_windows
for each replicate in enrichment_reproducibility.tsv doesn't match the corresponding counts from the individual replicate files (IP_1.tested_windows.tsv.gz and IP_2.enriched_windows.tsv.gz, etc.).
Specific examples of the mismatches are provided in the data below.
Relevant Data:
$ cat enrichment_reproducibility.tsv
# Replicates Status # Tested windows
1 Enriched 1237
1 Not enriched 26011
2 Enriched 335
2 Not enriched 6520
3 Enriched 171
3 Not enriched 4946
## my replicate1
$ zcat IP_1.tested_windows.tsv.gz | wc -l
31017
$ zcat IP_1.enriched_windows.tsv.gz | wc -l
471
## my replicate2
$ zcat IP_2.tested_windows.tsv.gz|wc -l
17397
$ zcat IP_2.enriched_windows.tsv.gz|wc -l
847
## my replicate3
zcat IP_3.tested_windows.tsv.gz|wc -l
11732
zcat IP_3.enriched_windows.tsv.gz|wc -l
1498
Could you please explain the potential reasons for this inconsistency?
For total windows you have 31017 + 17397 + 11732 - 3 [header] = 60143
in total from the three replicates.
In the output counts you have 1237 + 26011 + 335*2 + 6520*2 + 171*3 + 4946*3 = 56309
in total.
The enrichment_reproducibility.tsv
file removes blacklisted windows whereas the tested_windows.tsv
files contain all windows (for better or worse), so I suspect that the missing ~3k windows out of 60k are in the blacklisted BED file.
Thank you!
Hi,
Thank you for developing such a fantastic tool! Skipper has been incredibly helpful for my research in RBPs.
I've successfully run Skipper on my eCLIP data which has 3 replicates and I still have some questions about the tested windows of skipper.
Across all experimental groups, Rep1 shows the highest number of "Tested windows", even when its read counts (both eCLIP and input) fluctuate. I don't understand why this happens. I wonder whether the lower "Tested windows" in Rep2 and Rep3 will significantly affect peak enrichment.