LabTranslationalArchitectomics / riboWaltz

optimization of ribosome P-site positioning in ribosome profiling data
MIT License
46 stars 12 forks source link

error occurred in the codon usage part #73

Closed shaymin84 closed 11 months ago

shaymin84 commented 1 year ago

Hi

I tried to run the codon usage command

example_cu_barplot <- codon_usage_psite(reads_psite_list, mm81cdna, sample = samp1",
                                        fastapath = "/PATH/TO/FASTA",
                                        fasta_genome = FALSE,
                                        frequency_normalization = FALSE) 

and it generates error

Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord,  : 
  solving row 57259: 'allow.nonnarrowing' is FALSE and the supplied end (4320) is > refwidth

any idea why and how could solve? Thank you very much!

shaymin84 commented 1 year ago

update: I saw another (already solved) thread and found another guy with similar issue. I ended up filter the reads by length (26-33nt) and it could run with no issue.

However the plot I get is unexpected. the level of mathionine is not very high, and the stop codon seems to be labeled with wrong colors/legends 1693c48a-cb40-4152-b2b8-a20d0b5992b8

fabiolauria commented 1 year ago

Hi there, as you probably have seen in the previous and solved issue, the error you reported is likely to be caused by transcripts in the FASTA shorter than expected (i.e. shorter than what is reported in the annotation file). This means there are discrepancies between the GTF and the FASTA file.

Another option is that at least one P-site has been localised at the last nucleotides of one transcript sequence and, even if it is not displayed in the command you reported above, you set site = "asite" (or you set "esite" and one P-site is on the first nucleotide of a sequence).

In both cases, the result is the _codon_usagepsite function looking for P-sites (or A-sites or E-sites) outside the sequence boundaries, which causes the error described in your first message.

Regarding the plot, I cannot say anything about the signal on the methionine codon, except hypothesizing that read duplicates are still there and massively cover specific triplets. If this is not the case, I have no clue about the biological meaning of what you observed in your sample. Does the metaprofile actually show an accumulation on the start codon?

The color in the legend is wrong, you are right. I guess ggplot changed its behavior regarding _scale_colourmanual, breaks and values. I fixed the code, thank you.

Let me know if I can help in any other way.

Best Fabio

fabiolauria commented 11 months ago

Hi there, I'm closing this issue due to inactivity. Please re-open if required.

Best Fabio