COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
776 stars 164 forks source link

Dealing with Polycistronic transcripts #947

Open AlvaroBernardino opened 3 months ago

AlvaroBernardino commented 3 months ago

Hello! Recently I've had to sequence and analyse a RNA-Seq set from T. cruzi RNA. For that, I used Salmon in the alignment-independent mode (aligning to a reference transcriptome). Typical issues aside, I read afterwards that this organism has polycistronic mRNA: The genomic sequence is transcribed into long pre-mRNAs with more than one transcript before being chopped and translated by a specific mechanism. Considering I might have some of these in my dataset, how does Salmon deal with them? Say you have multiple matches for a single read (We used Nanopore sequencing). Is the rest of the read ignored? Is it all mapped and classified? How would I go about dealing with this kinds of reads?

marija-kra commented 3 months ago

Hi @AlvaroBernardino,

I also work on organisms within that group (Trypanosoma and Leishmania species), and I can confirm that when using RNAseq with these organisms, it's not an issue, because what you get is the same as for any other conventional organism, i.e. UTRs + CDS, so their polycistronic transcription should not be an issue.

I hope this helps!