YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data
Other
8 stars 3 forks source link

Question about Tested windows. #20

Closed FionaMoon closed 10 months ago

FionaMoon commented 10 months ago

Hi,

Thank you for developing such a fantastic tool! Skipper has been incredibly helpful for my research in RBPs.

I've successfully run Skipper on my eCLIP data which has 3 replicates and I still have some questions about the tested windows of skipper.

Across all experimental groups, Rep1 shows the highest number of "Tested windows", even when its read counts (both eCLIP and input) fluctuate. I don't understand why this happens. I wonder whether the lower "Tested windows" in Rep2 and Rep3 will significantly affect peak enrichment.

cat 1_enrichment_reproducibility.tsv
# Replicates    Status  # Tested windows
1   Enriched    1303
1   Not enriched    76178
2   Enriched    340
2   Not enriched    24426
3   Enriched    279
3   Not enriched    16452
cat 2_enrichment_reproducibility.tsv
# Replicates    Status  # Tested windows
1   Enriched    62941
1   Not enriched    81729
2   Enriched    61459
2   Not enriched    38334
augustboyle commented 10 months ago

Hello,Typically higher quality replicates will have more tested windows. The tested windows are optimized to yield the most enriched windows per replicate given multiple hypothesis test correction. If you want to be more permissive with your hit regions you could inspect windows that are detected by individual replicates.Did you have any other questions?EvanOn Dec 25, 2023, at 1:05 AM, LY @.***> wrote: Hi, Thank you for developing such a fantastic tool! Skipper has been incredibly helpful for my research in RBPs. I've successfully run Skipper on my eCLIP data which has 3 replicates and I still have some questions about the tested windows of skipper. Across all experimental groups, Rep1 shows the highest number of "Tested windows", even when its read counts (both eCLIP and input) fluctuate. I don't understand why this happens. I wonder whether the lower "Tested windows" in Rep2 and Rep3 will significantly affect peak enrichment. cat 1_enrichment_reproducibility.tsv

Replicates Status # Tested windows

1 Enriched 1303 1 Not enriched 76178 2 Enriched 340 2 Not enriched 24426 3 Enriched 279 3 Not enriched 16452 cat 2_enrichment_reproducibility.tsv

Replicates Status # Tested windows

1 Enriched 62941 1 Not enriched 81729 2 Enriched 61459 2 Not enriched 38334

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

FionaMoon commented 10 months ago

Hi Evan,

I'm confused about the discrepancy in the output of Enrichment_reproducibility which I hope you can help clarify.

The number of tested_windows and enriched_windows for each replicate in enrichment_reproducibility.tsv doesn't match the corresponding counts from the individual replicate files (IP_1.tested_windows.tsv.gz and IP_2.enriched_windows.tsv.gz, etc.). Specific examples of the mismatches are provided in the data below.

Relevant Data:

$ cat enrichment_reproducibility.tsv
# Replicates    Status  # Tested windows
1   Enriched    1237
1   Not enriched    26011
2   Enriched    335
2   Not enriched    6520
3   Enriched    171
3   Not enriched    4946

## my replicate1
$ zcat IP_1.tested_windows.tsv.gz | wc -l
31017
$ zcat IP_1.enriched_windows.tsv.gz | wc -l
471

## my replicate2
$ zcat IP_2.tested_windows.tsv.gz|wc -l
17397
$ zcat IP_2.enriched_windows.tsv.gz|wc -l
847

## my replicate3
zcat IP_3.tested_windows.tsv.gz|wc -l
11732
zcat IP_3.enriched_windows.tsv.gz|wc -l
1498

Could you please explain the potential reasons for this inconsistency?

augustboyle commented 10 months ago

For total windows you have 31017 + 17397 + 11732 - 3 [header] = 60143 in total from the three replicates.

In the output counts you have 1237 + 26011 + 335*2 + 6520*2 + 171*3 + 4946*3 = 56309 in total.

The enrichment_reproducibility.tsv file removes blacklisted windows whereas the tested_windows.tsv files contain all windows (for better or worse), so I suspect that the missing ~3k windows out of 60k are in the blacklisted BED file.

FionaMoon commented 10 months ago

Thank you!