MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Low mapping rate during Bowtie #147

Closed SaelinB closed 3 months ago

SaelinB commented 6 months ago

I am trying to use shortstack for de novo miRNA discovery, and have little experience with small RNA. When I run the software, only ~33% of my reads map:

Uniquely mapped (U) reads: 7153136/76751715 (9.3%)
Multi-mapped reads placed (P) with guidance: 14066544/76751715 (18.3%)
Multi-mapped reads randomly (R) placed: 3615495/76751715 (4.7%)
Very highly (H) multi-mapped reads (>=50 hits): 598454/76751715 (0.8%)
Not mapped (N) reads (no hits): 51318086/76751715 (66.9%)

I also only get 1 miRNA predicted in the final output

I pre-processed my reads with trimmomatic, removing adaptors and keeping reads >= 15 nt: java -jar ~/opt/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 12 -phred33 ${i}_combined_R1.fastq.gz ${i}_combined_R2.fastq.gz ${i}_1_P.q25.fq ${i}_1_UP.q25.fq ${i}_2_P.q25.fq ${i}_2_UP.q25.fq ILLUMINACLIP:small_rna_adaptors.fa:2:15:8:2:True LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:15

This is paired-end data, but I then only used R1 and unpaired R1/R2 reads from trimming. I also removed reads > 60 bp. FastQC length distribution shows a large peak at 17bp, and a smaller bump at 24bp.

Do you know of anything that would lead to a low mapping rate, or is my data just bad?

MikeAxtell commented 3 months ago

Low alignment rate coupled with your stated trimmed RNA size distribution and the very low number of microRNA annotations leads me to wonder if your read trimming was done correctly. Have you tried just letting ShortStack autotrim your reads? It automatically infers the adapter sequences.

Only use R1 for sRNA-seq, never R2.