alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 505 forks source link

Star reporting non-zero number of 'Fragments spanning SJ' for RNAseq alignment to bacteria #989

Open jmartin77777 opened 4 years ago

jmartin77777 commented 4 years ago

I recently mapped a number of M.tuberculosis RNAseq samples against an M.tuberculosis reference, and I noticed that STAR (v2.7.5b) is reporting a small, but non-zero number of fragments spanning splice junctions. The reference I am using has 1 exon per gene as you would expect for a bacteria. I was hoping to figure out exactly what STAR is seeing that is being counted as an SJ in this case. Is there a 'simple' explanation? I was thinking maybe reads falling off the end of the single exon genes (into the next gene I guess)?

I'm only looking at biotype="protein_coding" genes so it shouldn't be seeing any RNA structures. While there is no explicitly annotated UTR, I do see cases where the CDS is shorter than the exon, so I wonder if it would consider that as an SJ?

I don't think there is anything wrong, I just want to understand what I'm seeing.

alexdobin commented 4 years ago

Hi @jmartin77777

if you did not prohibit splicing with --alignIntronMax 1, STAR would try to identify "novel" junctions. Those are most likely the artifacts, both wet-lab (e.g. RT jumping) and mapping.

Cheers Alex