The read only appears once in the bam file and as you can see it's uniquely assigned to transcript TX:Z:ENST00000390237 from gene ENSG00000211592. As you can see (in the gtf or in UCSC) this read (in chr2: 88857240) matches both the intron from ENSG00000240040 and the exon from ENSG00000211592 (IGKC).
Why cellRanger, when including the introns, assigns this read to ENSG00000211592 instead of saying that it is an ambiguous read? If you could help me with this I would really appreciate it?
Hey,
First thanks a lot for supporting cellRanger, it's extremely useful.
I have a question regarding how cellRanger assigns ambiguous reads when counting both introns and exons (include-introns, cellRanger 5.0).
So I downloaded the bam file from the Single cell multimode 3K PBMCs (https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0) and I have one example of a read that matches entirely both an exon from ENSG00000211592 and an intron from ENSG00000240040. The exon and the intron overlap so this would be considered as an "ambiguous" read. However, I have seen that from the downloaded "possorted_bam.bam" the read is exclusively assigned to gene ENSG00000211592.
This is the read:
The read only appears once in the bam file and as you can see it's uniquely assigned to transcript TX:Z:ENST00000390237 from gene ENSG00000211592. As you can see (in the gtf or in UCSC) this read (in chr2: 88857240) matches both the intron from ENSG00000240040 and the exon from ENSG00000211592 (IGKC).
Why cellRanger, when including the introns, assigns this read to ENSG00000211592 instead of saying that it is an ambiguous read? If you could help me with this I would really appreciate it?
Best,
Kike