ArnaudDroitLab / metagene2

4 stars 0 forks source link

How to add TSS and TES on metagene plots for RIP-seq data? #35

Open MolyWang opened 8 months ago

MolyWang commented 8 months ago

Hello Eric, thanks in advance for any help that would be offered!

The data I have:

  1. RIP-seq dataset for an RNA-binding protein. (Full transcripts that co-immunoprecipitate with the protein are subject to RNA-seq).
  2. properly processed .bam and .bam.bai files. (I can visualize them properly with Integrative Genomics Viewer). I will need to plot 6 bam files in one plot, each is about 2~3GB.

    rip_bam_files = basename(Sys.glob(gsub(".bam", ".ba*", rip_bam_filesnames))) rip_bam_files[1:4] [1] "RBP_2h_sorted.bam" "RBP_2h_sorted.bam.bai" [3] "CON_2h_sorted.bam" "CON_2h_sorted.bam.bai"

  3. I am using the UCSC dm6 annotation dm6_tx = GenomicFeatures::transcripts(TxDb.Dmelanogaster.UCSC.dm6.ensGene)

The problem I want to solve: With the above data, I want to produce a metagene plot to summarize where the sequencing reads come from, whether they are 5'UTR, CDS, or 3'UTR. (In this case, I would like to collapse the whole genome into one generic transcript schematic. The x-axis of the output plot would be a generic transcript, and below would be something similar to my expected plot.)

image

One final question: Now isoforms would matter in my case, because different isoforms of the same gene may have different TSS and TES. If necessary, I have data to select the most abundant isoform for each gene, but is there any way to produce the plot without this selection?

Thank you again, Zhuyi

ericfournier2 commented 8 months ago

Hello Zhuyi,

Metagene2 cannot directly produce the plot you need. However, you can use it in rna-seq mode to summarize reads for the 5'UTR, CDS and 3'UTR regions separately, then use the generated tables to produce a combined plot like the one you showed.

Hope this helps, -Eric

MolyWang commented 8 months ago

Then which intermediate results should I save/output to make the pipeline most straightforward for me? Thanks.

ericfournier2 commented 8 months ago

Sorry for the delay!

You need to use the results from the add_metadata or the calculate_ci functions to get the data-frames which will allow you to combine the three types of regions into a single plot.

Cheers, -Eric