10XGenomics / vartrix

Single-Cell Genotyping Tool
MIT License
185 stars 27 forks source link

Low coverage at exon boundaries and question about 10X 5' scRNA-seq data #123

Open HenriettaHolze opened 3 weeks ago

HenriettaHolze commented 3 weeks ago

Hi, I want to use vartrix on 5' 10X scRNA-seq data to genotype cells.

I have previously used cellsnp-lite but would like to compare with a more UMI aware tool that can get a consensus across reads of the same transcript.

  1. When I compare coverage of cellsnp-lite results and vartrix results, I see differences at exon boundaries.
    I assume it is due to the re-alignment of reads.

    Below shown the UMI coverage of the CDS of a gene of interest. y-axis is --ref-matrix + --out-matrix which should represent transcript coverage at each position (not considering multi-allelic sites).

    I have marked below where the depth of vartrix and cellsnp-lite differ (blue) and the exon boundaries. There are "dips" around the exon boundaries.
    I'm concerned that I would miss many variants that affect splicing if re-alignment around exon boundaries is a problem.

    image
  2. Could you describe how you handle the paired-end data that CellRanger produces for 5' 10X data?
    The read containing cell barcode and UMI also contains part of the transcript, so for each CB UMI pair, there are at least 2 reads covering distinct regions of the gene, with the same read name.
    Below an example of a read pair from the CellRanger possorted_genome_bam.bam.

A00121:1010:HTN7YDSX7:2:1538:24966:25285        99      chr17   31677654        255     12M289839N62M   =       31967744  290190   CTTTCTTATATGGGGTGTCGTGGTGCACACCTGTAGTGTCACCAGTGGCACTCCAGCCTGGTGACAGAGCGAGA      FFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFF NH:i:1  HI:i:1  AS:i:161        nM:i:0  RG:Z:MF01_P2:0:1:HTN7YDSX7:2    GX:Z:ENSG00000178691       GN:Z:SUZ12      fx:Z:ENSG00000178691    RE:A:I  xf:i:25 CR:Z:TTCGAAGAGCCATCGC   CY:Z:FFFFFFF:FFFFFFFF   CB:Z:TTCGAAGAGCCATCGC-1    UR:Z:CCCTGTCTAT UY:Z:F:FFFFFFFF UB:Z:CCCTGTCTAT
A00121:1010:HTN7YDSX7:2:1538:24966:25285        147     chr17   31967744        255     100M    =       31677654        -290190    TAGTTCTAGGTACTTGGAAGTCCAAGATGGCAGGATGGCGTAAGCTCAGGAATTCAAGGTTACAGTGAGCTATGATTGCACAACTCTACTCCAGGCTGGG    FFFFFFFFF:FFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF       NH:i:1  HI:i:1  AS:i:161  nM:i:0   RG:Z:MF01_P2:0:1:HTN7YDSX7:2    TX:Z:ENSG00000178691,+  GX:Z:ENSG00000178691    GN:Z:SUZ12      fx:Z:ENSG00000178691       RE:A:N  xf:i:25 CR:Z:TTCGAAGAGCCATCGC   CY:Z:FFFFFFF:FFFFFFFF   CB:Z:TTCGAAGAGCCATCGC-1 UR:Z:CCCTGTCTAT UY:Z:F:FFFFFFFF    UB:Z:CCCTGTCTAT

Thanks a lot!

Best, Henrietta