State-Fisheries-Genomics-Lab / GT-seq

Standard Operating Procedure, Scripts, and data for GT-seq Genotyping
0 stars 0 forks source link

Quality Filtering #1

Closed david-dayan closed 3 years ago

david-dayan commented 3 years ago

Previous correspondence:

Sandra @sandrabohn
I've been wondering if we should start quality filtering the reads before we put them through the pipeline. What do y'all think about that?

David @david-dayan Throwing away quality info is one of the things that bothers me about the GTseq pipeline. I think the rationale behind it is that the probes are designed a priori and the targeted SNPs fall mostly in the middle of the read, so we can be reasonably confident that the calls at those SNPs are not due to sequencing error (provided there is enough depth to make a call). In other words, low quality bases shouldn't cause spurious SNPs like we might observe in a RADseq study.

I'd opt to leave it as is to keep things as simple as possible, but add it to the column of things that might be ameliorated by moving to a probabilistic SNP caller down the road. If we wanted to, we could add a quality trimming step with something like bbduk or trimmomatic at the front end of the pipeline. I don't feel strongly either way and would be happy to add it in there. What do you think?

Sandra @sandrabohn My main reservation is that a lot of people would freak out to hear we aren't quality filtering. However, given the requirement for 10 on target reads before genotyping I don't think quality issues would really affect the results. A miscall that happens to be at the location of our target SNP would not affect the genotypes, and other miscalls would just make the primer or probe sequences not match and would be discarded anyway.

david-dayan commented 3 years ago

closing this one. it seems like we have consensus