alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 497 forks source link

Why some alignments do not include GX tag or include "GX:Z:-" #2102

Closed L1angyan closed 2 months ago

L1angyan commented 3 months ago

Hi Alex, I used STARsolo (version=2.7.10a) to align scRNA-seq reads to hg38 genome.

$STAR --runThreadN $thread \
      --genomeDir STAR \
      --readFilesIn raw/${sample}_2.fastq.gz raw/${sample}_1.fastq.gz \
      --readFilesCommand zcat \
      --outFileNamePrefix mapping/${sample}. \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMattributes NH HI AS NM nM MD GX GN CR CY UR UY CB UB sS sQ sM \
      --soloType CB_UMI_Simple --soloCBwhitelist None \
      --soloCBstart 1 --soloCBlen 20 --soloUMIstart 21 --soloUMIlen 8 \
      --soloStrand Reverse --soloFeatures Gene --quantMode GeneCounts \
      --outFilterScoreMinOverLread 0.3 --outFilterMatchNminOverLread 0.3 \
      --soloCellFilter EmptyDrops_CR 3000 0.99 10 45000 90000 300 0.01 20000 0.01 10000

In the output BAM, there are some alignments without GX tag, like: A01886:446:HVJL7DSX7:3:2505:26467:23249 272 chr1 11868 1 72M * 0 0 CGTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCGTGTGTATTTGCTGTCTCTTGG CCCAG FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF NH:i:4 HI:i:3 AS:i:66 NM:i:2 nM:i:2 MD:Z:45A19A6 CR:Z:TGCGCAGGTCCAACTGCTCT CY:Z:IIIIIIIIIIIIIIIIIIII UR:Z:ATGTTGGA UY:Z:FFF,FFFF sS:Z:TGCGCAGGTCCAACTGCTCTATGTTGGA sQ:Z:IIIIIIIIIIIIIIIIIIIIFFF,FFFF sM:i:0 CB:Z:TGCG CAGGTCCAACTGCTCT UB:Z:ATGTTGGA Additional, there are some alignments with GX tag as GX:Z:-.

What are the differences between them?

Best Yan

alexdobin commented 2 months ago

Hi Yan,

In the latest version of STAR, all reads should have the GX:Z: tag, and GX:Z:- means it was not assigned to a gene.