broadinstitute / adapt

A package for designing activity-informed nucleic acid diagnostics for viruses.
MIT License
27 stars 1 forks source link

Consider consensus of all sequences when selecting guides #65

Closed haydenm closed 2 years ago

haydenm commented 2 years ago

There are two functions that propose candidate guides, from which guides in the output design are selected:

In both cases, determining candidate guides is effectively a heuristic and they are meant to encompass representative subsequences within a genomic window. More options could lead to a better solution but finding that solution would be less efficient. Prior to this PR, the candidate guide sequences came from subsequence clusters: k-mers (where k is the guide length) at each length-k site were clustered, and the guide sequences were the consensus of each cluster.

It is also possible that the consensus across all of the genome sequences at the length-k site (or, specifically, all of the genome sequences under consideration) could be a reasonable candidate guide sequence. In some cases, it may result in better detection activity than the consensus of any individual cluster. Therefore, this PR adds that overall consensus as a candidate guide.

The change affects some unit tests—it may yield a different solution that is equally optimal to or more optimal than the solution required by the unit test. Thus, this PR also modifies several unit tests to allow those different solutions.

codecov[bot] commented 2 years ago

Codecov Report

Merging #65 (0d9c56d) into main (f371c93) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   86.30%   86.34%   +0.04%     
==========================================
  Files          50       50              
  Lines        8219     8245      +26     
==========================================
+ Hits         7093     7119      +26     
  Misses       1126     1126              
Impacted Files Coverage Δ
adapt/alignment.py 95.84% <100.00%> (+0.07%) :arrow_up:
adapt/tests/test_alignment.py 99.66% <100.00%> (+<0.01%) :arrow_up:
bin/tests/test_design.py 99.54% <100.00%> (+0.02%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f371c93...0d9c56d. Read the comment docs.