Skipping of Indels in CountSNPASE

So I get why we want to skip over indels—they're much more likely to have trouble mapping correctly. But the way it's working now, it actually seems more likely to have a bias, since it's looking in the cigar string for indels, but that would only show up in the cigar string for one of the alleles, leaving the other alone. The SNP itself doesn't get used because presumably we were smart enough to not include indels in the bed file, but if there's an indel next to a SNP, that might introduce bias.

In this toy example, the red is reference, and the blue has a 1bp deletion. In the "truth" the expression is unbiased, but ASEr would call it 2:1, since the blue read that falls across both the SNP and the indel gets thrown out for having a deletion, but the red one doesn't.

screen shot 2016-06-20 at 3 40 16 pm

My intuition would be that the way to fix this is to come up with a list of indels, and then check to see if each read falls across one, and throw it out if so. However, some quick thought while procrastinating on writing a fellowship proposal did not suggest to me any obvious ways to do this efficiently.

TheFraserLab / ASEr

Skipping of Indels in CountSNPASE #15