Closed haydenm closed 2 years ago
Merging #65 (0d9c56d) into main (f371c93) will increase coverage by
0.04%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #65 +/- ##
==========================================
+ Coverage 86.30% 86.34% +0.04%
==========================================
Files 50 50
Lines 8219 8245 +26
==========================================
+ Hits 7093 7119 +26
Misses 1126 1126
Impacted Files | Coverage Δ | |
---|---|---|
adapt/alignment.py | 95.84% <100.00%> (+0.07%) |
:arrow_up: |
adapt/tests/test_alignment.py | 99.66% <100.00%> (+<0.01%) |
:arrow_up: |
bin/tests/test_design.py | 99.54% <100.00%> (+0.02%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f371c93...0d9c56d. Read the comment docs.
There are two functions that propose candidate guides, from which guides in the output design are selected:
alignment.determine_representative_guides()
. This is used with the maximize activity objective, to build the ground set for the combinatorial optimization algorithm.alignment.construct_guide()
. This is used with the minimize guides objective, to select, at each iteration of the set cover algorithm, the next most optimal guide for detecting the yet-to-be-detected sequences.In both cases, determining candidate guides is effectively a heuristic and they are meant to encompass representative subsequences within a genomic window. More options could lead to a better solution but finding that solution would be less efficient. Prior to this PR, the candidate guide sequences came from subsequence clusters: k-mers (where k is the guide length) at each length-k site were clustered, and the guide sequences were the consensus of each cluster.
It is also possible that the consensus across all of the genome sequences at the length-k site (or, specifically, all of the genome sequences under consideration) could be a reasonable candidate guide sequence. In some cases, it may result in better detection activity than the consensus of any individual cluster. Therefore, this PR adds that overall consensus as a candidate guide.
The change affects some unit tests—it may yield a different solution that is equally optimal to or more optimal than the solution required by the unit test. Thus, this PR also modifies several unit tests to allow those different solutions.