broadinstitute / adapt

A package for designing activity-informed nucleic acid diagnostics for viruses.

MIT License

27 stars 1 forks source link

There are two functions that propose candidate guides, from which guides in the output design are selected:

alignment.determine_representative_guides(). This is used with the maximize activity objective, to build the ground set for the combinatorial optimization algorithm.
alignment.construct_guide(). This is used with the minimize guides objective, to select, at each iteration of the set cover algorithm, the next most optimal guide for detecting the yet-to-be-detected sequences.

In both cases, determining candidate guides is effectively a heuristic and they are meant to encompass representative subsequences within a genomic window. More options could lead to a better solution but finding that solution would be less efficient. Prior to this PR, the candidate guide sequences came from subsequence clusters: k-mers (where k is the guide length) at each length-k site were clustered, and the guide sequences were the consensus of each cluster.

It is also possible that the consensus across all of the genome sequences at the length-k site (or, specifically, all of the genome sequences under consideration) could be a reasonable candidate guide sequence. In some cases, it may result in better detection activity than the consensus of any individual cluster. Therefore, this PR adds that overall consensus as a candidate guide.

The change affects some unit tests—it may yield a different solution that is equally optimal to or more optimal than the solution required by the unit test. Thus, this PR also modifies several unit tests to allow those different solutions.

Codecov Report

Merging #65 (0d9c56d) into main (f371c93) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   86.30%   86.34%   +0.04%     
==========================================
  Files          50       50              
  Lines        8219     8245      +26     
==========================================
+ Hits         7093     7119      +26     
  Misses       1126     1126

Impacted Files	Coverage Δ
adapt/alignment.py	`95.84% <100.00%> (+0.07%)`	:arrow_up:
adapt/tests/test_alignment.py	`99.66% <100.00%> (+<0.01%)`	:arrow_up:
bin/tests/test_design.py	`99.54% <100.00%> (+0.02%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f371c93...0d9c56d. Read the comment docs.

broadinstitute / adapt

Consider consensus of all sequences when selecting guides #65

Codecov Report