BIMSBbioinfo / RCAS

R package for the RNA Centric Annotation System (RCAS)
6 stars 3 forks source link

suggestions #11

Open DenizBartsch opened 4 years ago

DenizBartsch commented 4 years ago

I really love this tool and the output it gives, even for users like me.

From the RBP researcher point of view I have small suggestions:

However, this is a really nice tool and I hope my suggestions are useful!

Best,

Deniz

borauyar commented 4 years ago

Hi Deniz. Thank you for the nice suggestions. I will add the 'normalisation' and 'secondary structure prediction' to my to-do list.

Your third request is already possible with the current RCAS.

For feature-specific motif analysis, see RCAS::discoverFeatureSpecificMotifs function's example section. Just type ?RCAS::discoverFeatureSpecificMotifs in your RStudio command-line and it will show you the example usage there.

Feature-specific GO term analysis is also possible. First you need to find the list of genes/transcripts you need based on which feature overlaps you need and pass that list to the GO term analysis functions.

You can get the list of genes like this:

dt <- getTargetedGenesTable(queryRegions, txdbFeatures)

Then you can subset this table for the features you are interested in: Let's say we want to find genes targeted at 3'UTRs (overlapping with peaks at 3'UTRs) but don't overlap any promoters:

tx <- dt[promoters == 0][threeUTRs > 0]$tx_name

and pass this list of transcript ids to RCAS::findEnrichedFunctions, which runs gprofiler2 at the backend.

RCAS::findEnrichedFunctions(targetGenes = tx, species = 'hsapiens')

you could also copy this list and paste it over to functional enrichment webservices such as gprofiler or enrichr

DenizBartsch commented 4 years ago

Hi Bora,

I hope you are doing fine. I am still doing a lot of analysis and came up with a question and an idea, which you might be interested in or which you can tell me would make sense or not.

So first the question: We did eCLIP experiments in our lab in biological triplicates. I have analyzed all replicates with the help of RCAS and it looks fantastic. Now my boss wants to make plots of those peaks that overlap between all replicates, which we would call reproducible peaks. I read about this and the closest I can get to is bedtools intersect. I basically took all bed files intersected and receive a intersected bed file, which shows for each peak, how many times it is shared. I filter this file for those peaks, which are present in all replicates and intersect it again with each replicate to get a file containing only the common peaks (to again have the information about enrichment and p value). Sadly, by doing so the number of peaks in each files varies by around 10-20 and I do not really now why. For making plots like a correlation of all replicates and a density plot I require the same number of plots, so that is the reason for all that struggle. Do you have any suggestion for that? Also in terms of RCAS, I was wondering if you are planning to implement replicate overlap. At least for me it is still an issue to convincingly show how well independent replicates perform.

Coming to my idea: My project is based on RBPs mediating translational control in embryogenesis. I strongly believe that not a single RBP does this job in different cell types, but rather the interplay of groups of RBPs determine mRNA translation and localization. I would be really interested to globally assess mRNAs, which are bound by multiple RBPs, can be categorized. So basically, by determining common RBPs that bind groups of mRNAs (either by predicting it from motifs or overlapping clip data), one could determine the localization or translation rate of these mRNAs. There is that idea that mRNAs coding for proteins involved in the same process or pathway are thereby efficently translated and localized. But I think there is not much data analyzed in that regard. Since I do not know many computational people working on these aspects, I was wondering if you are interested in working on something like that.

Best and greeting from Cologne,

Deniz


Dr. rer. nat. Deniz Bartsch - Postdoctoral fellow Laboratory for Developmental and Regenerative RNA biology University of Cologne ZMMK-Forschungsgebäude (Geb. 66) Robert-Koch-Str. 21 D-50931 Köln Germany www.kurianlab.com


From: Bora Uyar notifications@github.com Sent: Wednesday, April 22, 2020 6:17:42 PM To: BIMSBbioinfo/RCAS Cc: Deniz Bartsch; Author Subject: Re: [BIMSBbioinfo/RCAS] suggestions (#11)

Hi Deniz. Thank you for the nice suggestions. I will add the 'normalisation' and 'secondary structure prediction' to my to-do list.

Your third request is already possible with the current RCAS.

For feature-specific motif analysis, see RCAS::discoverFeatureSpecificMotifs function's example section. Just type ?RCAS::discoverFeatureSpecificMotifs in your RStudio command-line and it will show you the example usage there.

Feature-specific GO term analysis is also possible. First you need to find the list of genes/transcripts you need based on which feature overlaps you need and pass that list to the GO term analysis functions.

You can get the list of genes like this:

dt <- getTargetedGenesTable(queryRegions, txdbFeatures)

Then you can subset this table for the features you are interested in: Let's say we want to find genes targeted at 3'UTRs (overlapping with peaks at 3'UTRs) but don't overlap any promoters:

tx <- dt[promoters == 0][threeUTRs > 0]$tx_name

and pass this list of transcript ids to RCAS::findEnrichedFunctions, which runs gprofiler2 at the backend.

RCAS::findEnrichedFunctions(targetGenes = tx, species = 'hsapiens')

you could also copy this list and paste it over to functional enrichment webservices such as gprofilerhttps://biit.cs.ut.ee/gprofiler/gost or enrichrhttps://amp.pharm.mssm.edu/Enrichr/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/BIMSBbioinfo/RCAS/issues/11#issuecomment-617880120, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APJNB7OINFYNOICRAAIQ6KLRN4KCNANCNFSM4MOGMC3Q.