YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data
Other
8 stars 3 forks source link

RBP bound sequence in repetitive elements (rRNA) #37

Closed ychy1 closed 1 week ago

ychy1 commented 1 week ago

Hi skipper team,

Thank you for sharing this powerful tool! I was wondering if there is a way to find the enriched binding sites on large repetitive elements such as ribosomal RNAs? It seems like the output folder "reproducible_enriched_re" only contains repeat name but not specific sites.

Thanks!

augustboyle commented 1 week ago

If I understand correctly, you want to know the position in the rRNA where the binding signal is coming from?

I have worked on this and it is somewhat nuanced. For any individual element (rRNA), there are the genomic coordinates and the reference sequence alignment coordinates. Sometimes the agreement between the two is limited, other times the sequences are identical and it's not actually difficult.

You can use the RepeatMasker file to try to assign reference sequence alignment coordinates to the end(s) of the eCLIP reads. This requires the BAM files, which should be available on RBP-Ark via Globus: https://rbp-ark.com/tutorial/

Alternatively you can download rRNA reference sequences and align directly to that. That makes it easier to understand coverage but things won't necessarily map properly because there are differences across rRNA loci.

I don't have any publicly available tools for this. How you would want to go about it depends on the analysis. For most of the repetitive elements, there either isn't much information beyond the enrichment per element, or you can basically count raw sequences overlapping the elements per element name because they are so similar (snRNAs/tRNAs). rRNAs are probably the most complicated case :/

ychy1 commented 1 week ago

Yes that's exactly what I wanted to know. Thank you so much for the detailed explanation!