CellRanger count include-introns

Hey,

First thanks a lot for supporting cellRanger, it's extremely useful.

I have a question regarding how cellRanger assigns ambiguous reads when counting both introns and exons (include-introns, cellRanger 5.0).

So I downloaded the bam file from the Single cell multimode 3K PBMCs (https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0) and I have one example of a read that matches entirely both an exon from ENSG00000211592 and an intron from ENSG00000240040. The exon and the intron overlap so this would be considered as an "ambiguous" read. However, I have seen that from the downloaded "possorted_bam.bam" the read is exclusively assigned to gene ENSG00000211592.

This is the read:

A00984:207:HGWCKDSXY:3:1101:9245:1814   16      chr2    88857240        255     90M     *       0    0AGGTGAAAGATGAGCTGGAGGACCGCAATAGGGGTAGGTCCCCTGTGGAAAAAGGGTCAGAGGCCAAAGGATGGGAGGGGGTCAGGCTGG      FFFFFFFFFF:FFFFF:FF:FFF,FFF,F:FFF::FFFF:FFFFFFFFFF:FFF:FFF:FFF::::FF:FFFFFFFFF,FFF:FFFFFF,    NH:i:1  HI:i:1AS:i:88 nM:i:0  RG:Z:pbmc_granulocyte_sorted_3k:0:1:HGWCKDSXY:3 TX:Z:ENST00000390237,+354,90M   GX:Z:ENSG00000211592  GN:Z:IGKC       fx:Z:ENSG00000211592    RE:A:E  xf:i:25 CR:Z:TTCACTGTCTTTGAGA   CY:Z::FFF:FFF,::FFFFF CB:Z:TTCACTGTCTTTGAGA-1 UR:Z:CTTTTCATTTGT    UY:Z:F,FFFFFFFFFF       UB:Z:CTTTTCATTTGT

The read only appears once in the bam file and as you can see it's uniquely assigned to transcript TX:Z:ENST00000390237 from gene ENSG00000211592. As you can see (in the gtf or in UCSC) this read (in chr2: 88857240) matches both the intron from ENSG00000240040 and the exon from ENSG00000211592 (IGKC).

Captura de pantalla 2022-09-14 a las 16 02 11

Why cellRanger, when including the introns, assigns this read to ENSG00000211592 instead of saying that it is an ambiguous read? If you could help me with this I would really appreciate it?

Best,

Kike

10XGenomics / cellranger

CellRanger count include-introns #187