TheFraserLab / ASEr

Get ASE counts from BAMs or raw fastq data -- repackage of pipeline by Carlo Artieri
MIT License
6 stars 3 forks source link

Make sure we throw out chimeric reads #14

Open petercombs opened 8 years ago

petercombs commented 8 years ago

There's at least some reads in my data set where there are multiple SNPs per read, and those SNPs disagree as to the parental origin of the read. It's rare—suggesting it's probably a sequencing error—but we ought to deal with it properly.

Probably the cleanest thing to do is just toss any read that is ambiguous, but one could imagine if there are 3 or more SNPs going with the consensus.

Also, I'm not sure if this should be a separate issue or not, but if there's a sequencing error that has neither the annotated reference or alternate allele, that should probably be tossed as well.

screen shot 2016-05-09 at 12 47 48 pm

In the attached screenshot, red reads are melanogaster, blue reads are simulans, and grey reads are at least somewhat ambiguous—there's one read with both mel and sim SNPs, and another with an unannotated allele.