broadinstitute / adapt

A package for designing activity-informed nucleic acid diagnostics for viruses.
MIT License
29 stars 2 forks source link

Implement objective function that maximizes expected activity #23

Closed haydenm closed 4 years ago

haydenm commented 4 years ago

This adds a new objective function for computing a guide set.

The objective that ADAPT had solved is to minimize the number of guides subject to constraints on coverage across the target genomes. This objective remains via the --obj minimize-guides option. It goes along with -gm (mismatches between guide and target), -gp (threshold on coverage), and --cover-by-year-decay. It is implemented now in the guide_search.GuideSearcherMinimizeGuides class.

The new objective function, accessible via --obj maximize-activity, maximizes the expected activity across the target genomes. We assume a uniform distribution, so this is the mean activity. This option requires that a predictive model be set; thus it does not use -gm, and it also does not use -gp because there is no set threshold on coverage. The objective adds penalties for guide set cardinality. There is a soft penalty, controlled by --soft-guide-constraint and --penalty-strength, and a hard penalty set by --hard-guide-constraint. The objective function is a non-negative non-monotone submodular function. This PR applies two algorithms to solve it: (1) the canonical greedy algorithm, which may work well in practice but does not provide theoretical guarantees because the function is non-monotone (unless --penalty-strength is 0); (2) a recent randomized greedy algorithm with worst-case guarantees for non-monotone submodular functions. It is implemented in the guide_search.GuideSearcherMaximizeActivity class.

This PR makes various changes to target_search.TargetSearcher that allow it to maximize an objective during a search for target options, with penalties on the number of primers and target length. As before, it can also optionally minimize an objective during a search.