alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 497 forks source link

Incorrect mate start with supplementary alignments #2066

Open adthrasher opened 4 months ago

adthrasher commented 4 months ago

Hello,

I have encountered an issue where STAR is pairing read records incorrectly. In the following set of reads, the 4th record is read 1 and is unaligned. It records its mate as having a position of chr2:32916431. However, that is the start position of one of the supplementary alignments for read 2. It should point to the primary alignment (chr2:32916428). I didn't see an issue for this and I didn't find anything in the documentation to explain it. I am using STAR 2.7.11b. I've attached a zip with the two reads and the command I'm invoking. I am aligning to GRCh38.

A00466:235:HCNJGDRX2:1:2104:19678:5822  137     chr2    32916428        1       126M    *       0       0       GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F    NH:i:3  HI:i:1  AS:i:110        nM:i:7  NM:i:7  MD:Z:13A26G4A5G3G1G0A67 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822  393     chr2    32916431        1       126M    *       0       0       GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F    NH:i:3  HI:i:2  AS:i:110        nM:i:7  NM:i:7  MD:Z:10A29G1A2G5G3A1G68 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822  393     chr2    32916429        1       126M    *       0       0       GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGCGGGGGCGGGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG  FFFFFFFFFFFFFFFFFFFFFF:FF,:FF::,:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,:,::,,,,,:,::,,,,F,,,:,,,,,:,,,,,,,,:,:F,,:,:,:,:::FF,:F    NH:i:3  HI:i:3  AS:i:110        nM:i:7  NM:i:7  MD:Z:12A27G3A0G5G3G1A68 RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE
A00466:235:HCNJGDRX2:1:2104:19678:5822  69      *       0       0       *       chr2    32916431        0       GTCGGCGGGAGAGGCCGGGAGGGAGGAAGACGAACGGAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:0  HI:i:0  AS:i:110        nM:i:7  uT:A:4  RG:Z:c947640 SM:SJST033767_D1 LB:SJST033767_D1 PL:illumina PU:HCNJGDRX2.1 CN:STJUDE

Archive.zip

alexdobin commented 4 months ago

Hi @adthrasher

In this case, there are three alignments for one of the mates, while the other mate is not mapped - so it's output only once and attached to one of the alignments. If you want the unmapped mate to be output for each alignment of the mapped mate, use --outSAMunmapped Within KeepPairs

adthrasher commented 4 months ago

Yes, that is what I see. However, this is not SAM spec-compliant. The RNEXT and PNEXT fields for the unmapped read MUST be that of the primary alignment of the mate. Instead, STAR is pointing to one of the secondary alignments.

From https://samtools.github.io/hts-specs/SAMv1.pdf:

PNEXT: 1-based Position of the primary alignment of the NEXT read in the template. Set as 0 when
the information is unavailable. This field equals POS at the primary line of the next read. If PNEXT
is 0, no assumptions can be made on RNEXT and bit 0x20.