High % of reads unmapped: too short in subset of samples

tf1993614 commented 2 years ago

Hi Alex,

I tried to map my single-end RNAseq data to reference genome using STAR and the result showed that ~50% were not mapped due to too short length according to the log file produced by STAR. I also found several similar questions have been raised before. In #731, you provided two suggestions. The first one is to map raw fastq file before trimming to reference genome. I tried that and it didn't improve the result too much. The second one is to blast unmapped reads to genomes in different species in NCBI blast page or UCSC browser. Regarding the second solution, I am wondering how to retrieve the unmapped reads from the BAM file? Could you show the code? By the way, my BAM file has already been sorted by STAR when doing alignment.

If the blast result shows that the unmapped reads don't have any contamination, what should I do?

Thanks,

Feng

alexdobin commented 2 years ago

Hi Feng,

sorry for the delayed reply. To output unmapped reads to SAM/BAM, please use --outSAMunmapped Within option.

Thanks! Alex

deekshamisri commented 5 months ago

Hi, I am facing the same issue. How do I prevent the reads from being trimmed before alignment? Is there a specific flag?

alexdobin / STAR

High % of reads unmapped: too short in subset of samples #1523