alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 505 forks source link

Poor alignment in Ion Torrent Proton Fastq #873

Open BreisOne opened 4 years ago

BreisOne commented 4 years ago

Dear Alex;

I am performing RNA sequence analysis to determine the DE genes in a human cell line sequenced with Ion Proton. I am using the default STAR settings and I get a low percentage of uniquely assigned reads (48-50%). I was reading about this and people describe a low % of alignment readings when using STAR on Ion Torrent technology but not so low (75%). People recommend performing a 2-step alignment (STAR ​​and Bowtie2 with unassigned readings), but we may be able to solve this problem with special settings for Ion Torrent technology.

Also my genome index was obtained from the primary assembly GRCh38. Hope this is correct

Regards, Brais, PhD Student at University of Vigo

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-4011-0

C3_URT_Log.progress.out.txt C3_URT_Log.out.txt C3_URT_Log.final.out.txt

alexdobin commented 4 years ago

Hi BreisOne,

your unique+multiple alignment rate is not too bad: 48%+28% . The multimapping rate of 28% is on the high side, this often indicates insufficient rRNA depletion, I would check whether multimappers mostly map to the rRNA loci.

I would also recommend using the annotations (GTF file) at the genome generation step, it may reduce the number of unmapped reads.

You can in principle reduce the unmapped rate by relaxing the stringency of mapping filters, but I generally do not recommend it as it increases the number of false alignments.

Cheers Alex

BreisOne commented 4 years ago

Hi Alex,

Use the annotations (GTF file) in the alignment of reads would not give the same result?. Would you recommend me some program to filter rRNA reads?

Regards, Brais,

alexdobin commented 4 years ago

Hi Brais,

using annotations will improve the mapping of spliced reads, and may slightly increase the overall mapping rate. For removal of rRNA alignments, I would recommend getting the rRNA loci from RepeatMasker track, and intersecting it with the BAM file using bedtools.

Cheers Alex