How to filter the output of skipper?

FionaMoon commented 11 months ago

Hi Skipper developer,

I run skipper on my dataset which has 3 eclip replicates and 3 inputs.

I finally got annotated peak calling results in the reproducible_enriched_windows folder.

The result looks like this:

I wonder which column can be used to filter the result. I want to get high-confidence RBP targets.

Thank you in advance!

augustboyle commented 11 months ago

Hello,

It looks like you have several highly enriched hits with no reads in the input and high read counts in the CLIP.

The tables is sorted by average adjusted log2 fold change which is a good place to start. The hits you have listed have values around 6, meaning a 64-fold increase.

I’d say tenfold enrichment (3.3 log2 fold change) is a strong signal, but reproducible hits are fairly reliable overall.

You can also filter by p-value but once q-values are below 10^-3 or so as they are in your case I wouldn’t say there’s much advantage.

Sometimes the very top hits are outliers by mappability or other technical factors so I might not focus on those; however, protein coding target transcripts usually don’t have that problem.

Evan

On Dec 8, 2023, at 7:34 PM, LY @.***> wrote:

Hi Skipper developer,

I run skipper on my dataset which has 3 eclip replicates and 3 inputs.

I finally got annotated peak calling results in the reproducible_enriched_windows folder.

The result looks like this: image.png (view on web) https://github.com/YeoLab/skipper/assets/69412570/85dcaa8b-13ed-42ee-9675-580efb9d50fe I wonder which column can be used to filter the result. I want to get high-confidence RBP targets.

Thank you in advance!

— Reply to this email directly, view it on GitHub https://github.com/YeoLab/skipper/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOTLJ5HBPIKVSWVJSNQ3M3YIPL3HAVCNFSM6AAAAABANPRAPGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTGNJYGM4DQNY. You are receiving this because you are subscribed to this thread.

FionaMoon commented 11 months ago

Thank you!

FionaMoon commented 11 months ago

Hi,

I have another question about the column "enrichment_n" in my results.

I have 3 replicates but most enrichment_n=2.

Does this mean the target gene is only enriched in 2 of my replicates?

augustboyle commented 11 months ago

Yes that is correct.CLIP replicate quality can vary substantially so perhaps your third replicate has few enriched windows globally. Alternatively the replicates are picking up different sites. In any case the enrichment odds ratio will average all three replicates. Courtesy of my phoneOn Dec 11, 2023, at 9:12 PM, LY @.***> wrote: Reopened #18.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

FionaMoon commented 11 months ago

Thank you for your reply!

I don't know which files in skipper results can be used to check replicates of bad quality or have fewer enriched windows?

I've checked "enrichment_reproducibility.tsv" in "output/enrichment_reproducibility" folder. It seems that the enrichment of rep3 is very few, even if the sequencing QC is OK.

This may be caused by experimental error and rep3 should be removed in the following analysis. Is that correct?

augustboyle commented 11 months ago

Yeah replicate 3 only has 9 enriched windows so that is not helping your analysis much. You could inspect them in a genome browser and try to decide if they are trustworthy. Bad replicates do seem to have some signal sometimes. In this case the hits are so few that removing it entirely might be the best option.Evan On Dec 11, 2023, at 10:36 PM, LY @.***> wrote: Thank you for your reply! I don't know which files in skipper results can be used to check replicates of bad quality or have fewer enriched windows? image.png (view on web) I've checked "enrichment_reproducibility.tsv" in "output/enrichment_reproducibility" folder. It seems that the enrichment of rep3 is very few, even if the sequencing QC is OK. image.png (view on web) This may be caused by experimental error and rep3 should be removed in the following analysis. Is that correct?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

evanb-exai commented 10 months ago

I realized I didn't read this very carefully. That summary is for enrichment reproducibility and summarizes the number of replicates with enriched windows ('# replicates', not 'replicate #'). It is likely that one of the replicates has fewer hits, but it does not specify which replicate that is - it could be any of the three. To view the number of enriched windows per replicate, you can count the number of lines in each enriched_windows file.

YeoLab / skipper

How to filter the output of skipper? #18