fhalab / evSeq

Computational tools for extremely low-cost, massively parallel amplicon-based sequencing of every variant in protein mutant libraries.
https://fhalab.github.io/evSeq/
Other
29 stars 9 forks source link

Known Limitations #31

Open brucejwittmann opened 3 years ago

brucejwittmann commented 3 years ago

evSeq expects no insertions or deletions relative to the reference sequence provided. Indeed, any read with a detected insertion or deletion is automatically discarded during QC. This works well for speeding up analysis of returned reads, but can lead to problems if (1) you expect insertions and deletions or (2) the best-scoring alignment of a read to the reference is one that opens a gap. There are currently no workarounds for problem 1. Problem 2 can be addressed by tuning the alignment parameters.

Alignment parameters are given as optional arguments; the parameter gap_open_penalty can be raised further to decrease the probability of problem 2 (i.e. score alignments such that those with gaps are scored poorly). Note that we have stress-tested the code with default alignment parameters against ~40,000 random DNA sequences with random mutation positions and found <10 instances of problem 2 (<0.025% of instances; all cases occurred when multiple mutations were placed next to (or near) one another at the end of the evSeq reads). The default evSeq alignment parameters are thus highly robust, but there are situations where they might need to be tuned. We strongly recommend that users evaluate the alignments returned by evSeq to make sure there are no unexpected insertions or deletions. Poor alignments will result in false sequencing negatives -- this can be particularly problematic if there are multiple variants in a well, and not all of them are recognized by the alignment (i.e., evSeq fails to recognize that there is a mixed well). As noted, such a situation would be exceedingly rare, but is worth being aware of. In many cases, alignment issues can be easily detected by reviewing both the decoupled and coupled files. To tell if the aligner has included insertions or deletions, (1) look for mutations present in the decoupled file that are not found in the coupled file and (2) look for "#DEAD#" wells that have more reads than the variable_count argument.