Roleren / ORFik

MIT License
33 stars 9 forks source link

Coverage plots using ORFiK and RiboCrypt #132

Closed yeroslaviz closed 1 year ago

yeroslaviz commented 1 year ago

Hi, if you remember from my first issue (#128), I have asked you if I can use this package to re-create a coverage plot for specific genes.

Since then I have tried to familiarize myself with the tool and to understand how it works. I hope i have done so and I would like now to go back to my first intention and ask you for your help in creating this plots using your tools.

I would appreciate it, if you can help me reaching this goal. I have my files both as fastq and as bam files, as I have mapped them outside of ORFiK. I think I know how to create an experiment object and how to count the reads using the Ribo-Seq workflow.

Is that all I need?

thanks in advance

Assa

212055131-141089ff-c6d5-44be-b958-ef11abcd46bc
m-swirski commented 1 year ago

Hi,

Hakon may be a bit too busy at the moment, I can take over answering this. RiboCrypt works regardless from exact structuring of the data - it's the shiny app that depends on ORFik experiment structure (which I would recommend anyway to handle the data in a systematic way), but if you just have a bunch of of bam files, annotation file (e.g. gtf) and a reference genome, you can readily use underlying multiOmicsPlot... family of functions, in this case multiOmicsPlot_list(). That being said, it's much much more convenient to use the app rather than using API through R studio and retyping arguments every time you want to change gene, genomic coordinates etc. Another issue is that RiboCrypt is quite focused on subcodon resolution property of ribo-seq, i.e. we color-code the three reading frames, allowing for visualization of out-of-frame translation etc, thus, I'd highly encourage p-shifting your bam-files and saving them as collapsed ofst format (optimized ORFik format for storing reads, the compression level is ~100 fold in ribo-seq, and input/output time is reduced even further).

As for now, we don't yet support multiple libraries display on single pane in RiboCrypt. However, it's one of high priorities, so if needed I can implement it quickly.

So: crucial questions are - do you want your reads to be p-shifted (I strongly encourage so), do you want to display 3 reading frames (depends on use-case), and how many libraries should be visible in one pane? Play around with ribocrypt.org to see what we usually do (select a few libraries etc).

Let me know answers to these questions and I'll provide you with code snippet that should help you out. If you're more novice to ribo-seq, I can also show you around ribocrypt.org and explain key aspects of it, like the p-shifting I was mentioning here.

Best, Michal

yeroslaviz commented 1 year ago

Thanks for the detailed answer. Last question first - Yes, I will classify myself as novice to the field of ribo-seq, and I'm am now struggling to understand the underlying analysis including the p-shifting step. ☹️

Just to make things clear, I also have the raw fastq files and the adapter sequence I need to trim upon. If you can show me how to create the experiment object from scratch, I would be happy to use ORFiK from the fastq files step. I was not really sure, it make a difference, if I use the bam or the fastq files.

Now to your questions -

I think the reads should be p-shifted (as you obviously recommended), but it would be great if I can visualise them before and after the shifting to see what have been changed. I also think collapsing the mapped files and save them as ofst format might increase analysis speed.

about the three different reading frames, I'm not sure, what it means, to be able to differentiate between the three, but if I understand it correctly, I am not sure, it's an important factor.

With multiple libraries, do you mean, different conditions (as shown above)? I guess this might be helpful, but I can live for now with separate plots for each condition. But is is possible to work with multiple replica? Are they being summarised together to create the plot, or is it just one plot per one sample (fastq file)?

The data set I'm working on for now, has 9 different conditions, but plotting them together might be not necessary. It would be great though, if we can make plots of at least 3 or 4 conditions in one go.

Let me know if you need further information.

thanks a lot

Assa

m-swirski commented 1 year ago

Okay, I see. Raw fastq processing is supported in ORFik, but in principle we have a new package, massiveNGSpipe, not yet open to the public, which streamlines the process so that it's 'one-click' based really. It's more focused on publicly deposited data (SRA), but we can add local input pretty easy as well and it is planned. Right now one has to assemble the pipeline from building blocks described in ORFik vignettes: https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/Annotation_Alignment.html https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/ORFikExperiment.html#orfik-example-experiment https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/Ribo-seq_pipeline.html

I think we can open massiveNGSpipe very soon, maybe you could just use that, then the whole process would come down to writing a csv config file with annotations for your samples.

In principle it's possible to plot both p-shifted and not p-shifted., what I like to do is to have p-shifted and then depending on desired resolution, blurring coverage with sliding window (k-mers option in ribocrypt)

Frames visualization may be obfuscating factor with multiple libraries, granted, but it's still very insightful to use on summarized view. It's all optional in RiboCrypt.

Relating the conditions, it sounds like quite common use-case, which we somehow don't yet completely support within the app, so it would require a little bit of coding. It's very high on the priority list, so I might just implement it this week, it's been long due. Right now we display individual libraries on top of one another (stacked subplots), which is good for i.e. comparing replicas. For your case I'd say it would be better to merge replicates and inspect them this way (later for differential coverage calculations you split them back to calculate statistics).

We have ORFik/RiboCrypt support discord channel, so if you're interested it might be more efficient to meet for ~30 minutes so that I would explain the basics (p-shifting, sub codon resolution etc) with RiboCrypt, rather than typing all the questions and answers. I'm relatively free this week, perhaps today is actually best. We need to expand userbase and we don't yet run more organized courses on it (which we hope to do at some point). Also, gathering usercases to cover as much usability as possible is very important for us.

Best, Michał

yeroslaviz commented 1 year ago

Hi Michal,

Talking in person (online) would be great, if possible. Unfortunately, today I can’t though, as I have a workshop the whole day (till 8pm my time, Munich Germany).

But gladly tomorrow morning till 2pm. As far as I can see, you’re in Warsaw, so I hope it would be possible to find a common time to discuss. But maybe the end of the week might be better for me, as I can also go through your mails and try to at least prepare something. Or at least my questions 😊

I will be also be really happy to test your new packages, both RyboCrypt as well as massiveNGSpipe when it is online

Would Thursday or Friday before lunch be good for you?

Thanks for the help

Assa

-- Assa Yeroslaviz, PhD Max Planck Institute for Biochemistry Computational Systems Biochemistry, Bioinformatics Core Facility (room I5/7) Am Klopferspitz 18, 82152 Martinsried Germany Tel: +49 89 8578 2427 Email: @.**@.>

From: Moritz Gerster @.> Reply to: Roleren/ORFik @.> Date: Monday, 6 March 2023 at 13:54 To: Roleren/ORFik @.> Cc: Assa Yeroslaviz @.>, Author @.***> Subject: Re: [Roleren/ORFik] Coverage plots using ORFiK and RiboCrypt (Issue #132)

Okay, I see. Raw fastq processing is supported in ORFik, but in principle we have a new package, massiveNGSpipe, not yet open to the public, which streamlines the process so that it's 'one-click' based really. It's more focused on publicly deposited data (SRA), but we can add local input pretty easy as well and it is planned. Right now one has to assemble the pipeline from building blocks described in ORFik vignettes: https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/Annotation_Alignment.html https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/ORFikExperiment.html#orfik-example-experiment https://bioconductor.org/packages/release/bioc/vignettes/ORFik/inst/doc/Ribo-seq_pipeline.html

I think we can open massiveNGSpipe very soon, maybe you could just use that, then the whole process would come down to writing a csv config file with annotations for your samples.

In principle it's possible to plot both p-shifted and not p-shifted., what I like to do is to have p-shifted and then depending on desired resolution, blurring coverage with sliding window (k-mers option in ribocrypt)

Frames visualization may be obfuscating factor with multiple libraries, granted, but it's still very insightful to use on summarized view. It's all optional in RiboCrypt.

Relating the conditions, it sounds like quite common use-case, which we somehow don't yet completely support within the app, so it would require a little bit of coding. It's very high on the priority list, so I might just implement it this week, it's been long due. Right now we display individual libraries on top of one another (stacked subplots), which is good for i.e. comparing replicas. For your case I'd say it would be better to merge replicates and inspect them this way (later for differential coverage calculations you split them back to calculate statistics).

We have ORFik/RiboCrypt support discord channel, so if you're interested it might be more efficient to meet for ~30 minutes so that I would explain the basics (p-shifting, sub codon resolution etc) with RiboCrypt, rather than typing all the questions and answers. I'm relatively free this week, perhaps today is actually best. We need to expand userbase and we don't yet run more organized courses on it (which we hope to do at some point). Also, gathering usercases to cover as much usability as possible is very important for us.

Best, Michał

— Reply to this email directly, view it on GitHubhttps://github.com/Roleren/ORFik/issues/132#issuecomment-1456079319, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACEDSQ25BYR2KOHPMFH5L4DW2XNBNANCNFSM6AAAAAAVQG3Q6Y. You are receiving this because you authored the thread.Message ID: @.***>

m-swirski commented 1 year ago

Hi Assa,

Yes, in person makes way more sense than typing lengthy answers :). I think Thursday would be the best, tomorrow is also doable. For Thursday I might be able to implement and deploy the condition comparisons and open massiveNGSpipe to the public, so maybe that would be preferable. For visualizations as such bam files will be perfectly suitable, for more robust framework with interactive app ORFik experiment will be needed, but this can also be constructed either from bam files or all the way from fastqs (optimal).

Let me know if you use discord - I'll send you an invitation to support channel then. We find it most useful for reporting bugs, asking questions as well as online meetings.

Best, Michal

yeroslaviz commented 1 year ago

Let us do it on Thursday.

I have discord and I can join the support channel if you’ll send me a link.

Send Assa

-- Assa Yeroslaviz, PhD Max Planck Institute for Biochemistry Computational Systems Biochemistry, Bioinformatics Core Facility (room I5/7) Am Klopferspitz 18, 82152 Martinsried Germany Tel: +49 89 8578 2427 Email: @.**@.>

From: Moritz Gerster @.> Reply to: Roleren/ORFik @.> Date: Monday, 6 March 2023 at 14:28 To: Roleren/ORFik @.> Cc: Assa Yeroslaviz @.>, Author @.***> Subject: Re: [Roleren/ORFik] Coverage plots using ORFiK and RiboCrypt (Issue #132)

Hi Assa,

Yes, in person makes way more sense than typing lengthy answers :). I think Thursday would be the best, tomorrow is also doable. For Thursday I might be able to implement and deploy the condition comparisons and open massiveNGSpipe to the public, so maybe that would be preferable. For visualizations as such bam files will be perfectly suitable, for more robust framework with interactive app ORFik experiment will be needed, but this can also be constructed either from bam files or all the way from fastqs (optimal).

Let me know if you use discord - I'll send you an invitation to support channel then. We find it most useful for reporting bugs, asking questions as well as online meetings.

Best, Michal

— Reply to this email directly, view it on GitHubhttps://github.com/Roleren/ORFik/issues/132#issuecomment-1456137511, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACEDSQ6DADFUHY7A6N3MLJ3W2XRALANCNFSM6AAAAAAVQG3Q6Y. You are receiving this because you authored the thread.Message ID: @.***>

m-swirski commented 1 year ago

Okay, see you Thursday then.

Best, Michal

Roleren commented 1 year ago

Is this issue solved ?

m-swirski commented 1 year ago

Well, I wanted to post an update here when we have multi-condition comparative coverage plots in ribocrypt, but otherwise there is nothing ORFik related directly. Assa could provide the code we came up during the meeting to solve his issue, but roughly it came down to the usage of coveragePerTiling function on ORFik experiments.

Roleren commented 1 year ago

Ok, I will close the issue, post any update here if needed