Mayrlab / scUTRquant

Bioinformatics pipeline for single-cell 3' UTR isoform quantification
https://Mayrlab.github.io/scUTRquant
GNU General Public License v3.0
15 stars 3 forks source link

Negative 3'UTR lengths #93

Open Apistogramma-2 opened 1 month ago

Apistogramma-2 commented 1 month ago

Hello,

I was applying your pipeline to some sc-data from the allen brain atlas and i got a handful of negative values for the 3'UTR length in my sce object. I was using the utrome_mm10_v2. One example would be the 3'UTR-isoform ENSMUST00000135807.1-UTR-4987 with a length of -2595 nbs. All the others look great but i do not know how to interpret these ones. Hope you can help me. Thanks in advance!

Best, Janus

mfansler commented 1 month ago

Hi Janus,

Thanks for the interest! Indeed this is obviously not intuitive. 🤔

These should correspond to novel cleavage sites that were identified upstream of any GENCODE-annotated 3' ends. Since we did not have full transcript sequencing, we were reluctant to make any assertion about what the full splice isoform was - only that cleavage is happening there. As such, we didn't have a proper STOP codon identified for these, and so this isn't really a "3'UTR length". Rather, the negative value should reflect the (genomic) distance to the annotated STOP codon of the reference transcript to which it references.

So, concretely we have that "ENSMUST00000135807.1-UTR-4987" is a cleavage site found 4,987 nts upstream of the cleavage site of the Ensembl mouse transcript ENSMUST00000135807.1. This cleavage site is 2595 nts upstream of the original STOP codon of ENSMUST00000135807.1.

This is admittedly not so informative and so we provided a column in the published annotations to flag this, namely is_improper_utr_length in the annotations (e.g., Mouse annotation Supplemental Data 4).

I think it would be a nice enhancement to our annotation if someone would identify where for such transcripts their STOP codon lies so we could associate a proper 3'UTR length with them.

Hope that clarifies things!

Apistogramma-2 commented 1 month ago

Thanks for the fast reply and the clarification. That explains it to me :)