There's at least some reads in my data set where there are multiple SNPs per read, and those SNPs disagree as to the parental origin of the read. It's rare—suggesting it's probably a sequencing error—but we ought to deal with it properly.
Probably the cleanest thing to do is just toss any read that is ambiguous, but one could imagine if there are 3 or more SNPs going with the consensus.
Also, I'm not sure if this should be a separate issue or not, but if there's a sequencing error that has neither the annotated reference or alternate allele, that should probably be tossed as well.
In the attached screenshot, red reads are melanogaster, blue reads are simulans, and grey reads are at least somewhat ambiguous—there's one read with both mel and sim SNPs, and another with an unannotated allele.
There's at least some reads in my data set where there are multiple SNPs per read, and those SNPs disagree as to the parental origin of the read. It's rare—suggesting it's probably a sequencing error—but we ought to deal with it properly.
Probably the cleanest thing to do is just toss any read that is ambiguous, but one could imagine if there are 3 or more SNPs going with the consensus.
Also, I'm not sure if this should be a separate issue or not, but if there's a sequencing error that has neither the annotated reference or alternate allele, that should probably be tossed as well.
In the attached screenshot, red reads are melanogaster, blue reads are simulans, and grey reads are at least somewhat ambiguous—there's one read with both mel and sim SNPs, and another with an unannotated allele.