alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

SmartSeq data mapping: discrepency between --outSAMattrRGline and --readFilesManifest options #1149

Open pavanvidem opened 3 years ago

pavanvidem commented 3 years ago

Hi @alexdobin, We're currently using --readFilesManifest to provide SmartSeq cell-ids information in the STARsolo Galaxy wrapper (https://github.com/galaxyproject/tools-iuc/blob/master/tools/rgrnastar/rg_rnaStarSolo.xml).

At least for the small data sets that we've tested, there is no difference in the count matrices using any of the --outSAMattrRGline or --readFilesManifest options.

But there is a difference in BAM files. Both the options produce the @RG header line. When --outSAMattrRGline is used, additionally, each alignment in the BAM file contains an RG tag, whereas --readFilesManifest does not add an RG tag. Shouldn't they produce the same result? Are RG tags used somewhere during the quantification?

alexdobin commented 3 years ago

Hi Pavankumar,

to get RG tag into the BAM output you would need to add RG to --outSAMattributes, e.g. --outSAMattributes NH HI AS nM RG. With --outSAMattrRGline it is added automatically.

Cheers Alex