LabTranslationalArchitectomics / riboWaltz

optimization of ribosome P-site positioning in ribosome profiling data
MIT License
46 stars 12 forks source link

Why are some read lengths not in frame? #64

Closed weishwu closed 2 years ago

weishwu commented 2 years ago

Sorry if I'm asking naive questions. I'm new to ribo-seq analysis.

Q1. Most of the read lengths show a nice 3-nt periodic pattern in CDS, but some do not, like 31nt in my case. The graphs are attached ("frames_stratified_plots.pdf", "metaprofile_plots_readlen31.pdf", and "metaprofile_plots_readlen32.pdf"). What are those reads that are not in-frame? Should I only use the in-frame read lengths for downstream analyses?

Q2. While the metagene plot from all genes shows nice periodic pattern ("metaprofile_plots.pdf"), the pattern can go wild for a single gene, and there are peaks outside of CDS. Is this normal? ("metaprofile_plots_transcript_ENST00000412830.8_gene_PWP1.pdf", "metaprofile_plots_transcript_ENST00000541166.1_gene_PWP1.pdf", "metaprofile_plots_transcript_ENST00000547995.5_gene_PWP1.pdf" and "metaprofile_plots_transcript_ENST00000552760.5_gene_PWP1.pdf")

frames_stratified_plots.pdf metaprofile_plots_readlen31.pdf metaprofile_plots_readlen32.pdf metaprofile_plots_transcript_ENST00000412830.8_gene_PWP1.pdf metaprofile_plots_transcript_ENST00000541166.1_gene_PWP1.pdf metaprofile_plots_transcript_ENST00000547995.5_gene_PWP1.pdf metaprofile_plots_transcript_ENST00000552760.5_gene_PWP1.pdf metaprofile_plots.pdf

Thanks.

fabiolauria commented 2 years ago

Hi there.

A1. It is quite common - and, to me, perfectly reasonable - to observe reads that are in frame and reads that are not. You can clearly see the same pattern in this section of riboWaltz's ReadMe. I would say this is due to the action of RNAses and the standard size of ribosomes that make some populations of reads more "reliable" than others. Hopefully, lengths which are associated to in-frame reads are the most frequent and only a negligible amount of reads are >31 nts long and they are not going to affect downstream analyses. You can decide to keep or discard them to get better and more precise results, but it's up to you.

A2. Also in this case yes, it is normal to have good metaprofiles for multiple transcripts but not for single ones. This is because of the depth of the sequencing, that is usually too low for getting a nice coverage for single transcripts (especially if they are "underrepresented"; actin or tubulin are usually highly translated and might show something more). Also consider that metaprofiles are based on P-sites, that just cover one nucleotide. I think that even a whole sequencing run dedicated to a single sample wouldn't be enough for observing the 3-nucleotide periodicity for one mRNA. Given this, I would not suggest to generate P-site -based metaprofiles for single transcripts or even small subsets. It would be better to project the whole reads, i.e. their ~30-nts-long footprints, on the transcript of interest and average the replicates. So that you increase the signal at each mRNA position and can look for potential differences in ribosome coverage among multiple samples. This way, for sure you cannot detect defects at single nucleotide resolution for specific mRNAs separately, but at least you can have an hint on the "translational status" of single transcripts. It's a bit of manual work (riboWaltz doesn't include any function for this yet), but I think it can be done. Finally, don't worry about signal on the UTRs, it is a common feature. It can be due to upstream translation start site and alternative ORFs, but most of the time they are just random reads mapping outside the CDS.

Hope it helps.

Best, Fabio

weishwu commented 2 years ago

Thanks so much for the detailed answers. These are super helpful! When I project the whole reads on the transcript of interest, should I shift the read positions (positions of all the bases in the read) using the P-site offset?

fabiolauria commented 2 years ago

I wouldn't do that, otherwise you cannot really appreciate the real ribosome position. Plus, it's additional work that in my opinion is not necessary. Online there are several examples of similar plots, which should look like this for two conditions (note the signal upstream the gray area, which is the CDS):

image

weishwu commented 2 years ago

Thanks! I generated a coverage plot using the reads. There is a very high peak in the 5' UTR. (yellow: UTR; blue: CDS)

PWP1_ENST00000412830.8_coverage.pdf

fabiolauria commented 2 years ago

Yes indeed. Might be the starting codon of an annotated isoform of the same transcript or an alternative and an unannotated translation start site for unknown ORFs. You can explore this hypotheses by checking the sequence of the transcript or by similar analysis.

Since you got the profile and the issue is not strictly related to riboWaltz, I'm going to close it.

Best Fabio