10XGenomics / vartrix

Single-Cell Genotyping Tool
MIT License
185 stars 27 forks source link

Unequal number of reference reads #73

Open grasshoffm opened 2 years ago

grasshoffm commented 2 years ago

Hi. I am using VarTrix on the mitochondrial genome and I use all possible variants.

When I check the ref_matrix_coverage.mtx results, the reference reads are different.

For example: The three variants MT:16043:A:C, MT:16043:A:G and MT:16043:A:T all have the same reference base. But for one cell, the variants MT:16043:A:C and MT:16043:A:G have 1 read, while MT:16043:A:T has 0 reads.

Since the reference is the same and I am analysing the exact same reads (those covering this position), these values should be the same.

pmarks commented 2 years ago

@grasshoffm I'm not sure you necessarily expect the ref read count to be identical in all cases here.

Consider the case of one read that contains a C at that position. For A:C you'd expect 0 ref reads. For A:G, it’s a toss-up whether that read gets counted as ref or alt -- and it looks like the code will default to counting it as a ref base. So that might explain what you're seeing. If you want to dig into details it might be useful to post and IGV screenshot of the locus along with the results you're seeing.

@ifiddes might also have some thoughts.

ifiddes commented 2 years ago

I agree, I would need to see some screenshots. In the case of an alignment tie, VarTrix calls the read as reference. It only calls a read as alt if the alignment score to the alt allele outscores the reference alignment.

grasshoffm commented 2 years ago

I looked at the position 16043 and cell AAACCCAAGGAACTAT-1.

For this position, I have a read with an T insertion after the position of interest. This reads get counted as reference for the A>C and A>G variants, but not for the A>T. It is also not counted as an alternative read. Position_16043_Cell_AAACCCAAGGAACTAT-1

I then check a different position (8965). Here I find the scenario you mentioned. A mutated read is counted as reference, even if it does not support the reference allele. Position_8965_Cell_AAACCCATCAATCTTC-1

But this isn't true for cell AAACCCATCCTGCTAC-1. Here mutated read is counted towards the alternative allele, but the number of reference reads is the same for all variants. Please see the attached excel file. Position_8965_Cell_AAACCCATCCTGCTAC-1

Here is an excel file with the reads I get from VarTrix and IGV. example_reads.xlsx

ifiddes commented 2 years ago

I haven't had time to look closely, but one thing to remember about VarTrix is that it is performing local realignment of each read using Smith-Waterman. As a result, it is possible that what VarTrix is counting is not exactly what you are seeing in IGV.