alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 505 forks source link

Mapping to miRBase #1384

Open charlesdavid opened 2 years ago

charlesdavid commented 2 years ago

Hi Alex, We are developing a workflow that involves mapping sets of short nc RNA reads to the miRBase (mature micro RNAs) database. We would like to use STAR for this alignment. We have adjusted the genome generate parameters to accommodate the short 'scaffolds' and the many sequences (~50k) and have successfully generated the genome suffix array.

Could you suggest a set of parameters that would optimize the STAR algorithm for aligning these very short RNAs to the database, maximizing the alignment of true positives while helping to avoid the false positives that many aligners suffer?

Thanks as always :-)

alexdobin commented 2 years ago

Hi Charles,

For the genome generation, you probably already scaled down --genomeChrBinNbits and --genomeSAindexNbases

For mapping, here are my initial suggestions. You may need to vary these parameters to optimize the mapping.

  1. I strongly recommend trimming the 3'-end adapter. You can use an external trimmer, or STAR's --clip3pAdapterSeq.
  2. Increase stringency of mapping filter: --outFilterMatchNminOverLread 0.9 --outFilterMismatchNoverReadLmax 0.05
  3. Increase search sensitivity with --seedSearchStartLmax 10 --winAnchorMultimapNmax 100

Cheers Alex