buntonj / SFO_CPP

A fast, templated C++ header library for submodular optimization (subset selection) problems.
0 stars 0 forks source link
combinatorial-optimization cpp submodular-optimization summarization

sfo_cpp: a lightweight, templated submodular function maximization in C++

Overview

A headers-only C++ library for submodular optimization (subset selection) problems on arbitrary data types.

This library implements a handful of basic algorithms for submodular function maximization. In particular, they solve the problem:

\begin{array}{cc} \underset{S\subseteq V}{\text{maximize}} & F(S) \\ \text{subject to} & S \in \mathcal{C}\end{array}

where $F:2^V\to\mathbb{R}$ is a submodular function and $V$ is a ground set of $n$ elements, and $\mathcal{C}\subseteq 2^V$ is a constraint set. Submodular functions satisfy the inequality:

$$ F(S) + F(T) \geq F(S\cup T) + F(S\cap T), $$

for any $S, T \subseteq V$. More intuitively, these functions exhibit the property of diminishing returns.

Monotone functions are functions that preserve the subset partial order on the power set $2^V$:

$$ A\subseteq B \quad \implies\quad F(A) \leq F(B) $$

For such functions, greedy algorithms are both efficient and provably near-optimal when the set $\mathcal{C}$ is some simple form of constraint, such as cardinality, knapsack, matroid, independence system, etc.

Usage

Building and testing

This library uses Bazel as its build system. To compile, make sure you have Bazel installed on your system and run:

bazel build ...

which will build and install both the headers library and the tests into the /build directory. If you would like to run the tests, you can test the library with:

bazel test ...

Alternatively, you can run a specific test via:

bazel test sfo_cpp/tests/test_monotone_greedy

or similar, for a different test.

Usage in other contexts

Basic usage follows four simple steps:

1) Define a std::unordered_set of "ground set" elements to summarize. 2) Import and create the appropriate GreedyAlgorithm object from this library. 3) Give the GreedyAlgorithm object a reference to the ground set and Constraints it must satisfy (if desired). 4) Call GreedyAlgorithm.run_greedy(CostFunction) on a sfo_cpp::CostFunction abstract base class.

The entire library is templated via some typename E, where the instatiation of E defines the C++ objects that the algorithms will be summarizing. The library accomplishes this by populating a std::unordered_set<E*> of pointers to elements in $V$ rather than directly copying elements around.

As a result, this library can summarize almost any C++ data type you want!

All you need to do is implement a CostFunction that evaluates how "good" a given summary is, and a Constraint that evaluates if a summary satisfies some constraint set or not, and hand everything over to the GreedyAlgorithms implemented here.

Once they are run, the algorithms will hold a variable curr_set, which is a STL unordered_set<E*> of pointers to the elements of the ground set (which is also a STL unordered_set<E*>).

Under the hood

To do this, the library defines a templated override of the comparison operator < for basic pairs std::pair<E, double>. The library uses this operator to interface with the STL std::priority_queue container and sort elements $j \in V$ by their marginal benefit as measured by $F$.

For convenience, the library includes a simple Element class, which the library will fall back to and instantiate if the template instantiation for typename E is not given.

Cost function class

In cost_function.hpp, the library defines the templated (typename E) abstract base class CostFunction to represent the mathematical function $F:2^V\to\mathbb{R}$.

The CostFunction abstract base class only has one virtual method:

A couple important example derived classes (specific cost functions) are implemented, such as the Modular and SquareRootModular cost functions.

In principle, however, one needs only to define an appropriate CostFunction<E> object with evaluation overloads to run the greedy algorithms on it.

Constraint class

In constraint.hpp, the library defines the templated (typename E) abstract base class Constraint to represent the mathematical constraint $S\in \mathcal{C}$.

The set $\mathcal{C}$ could be all subsets with less than $B$ elements, all subsets whose knapsack cost is less than $B$, or any other general constraint. Many of the implemented algorithms still retain guarantees even when the constraint set $\mathcal{C}$ is a matroid, knapsack, independence system, p-system, or some intersection of them.

The Constraint<E> abstract base class has two pure virtual functions:

To implement a constraint, you just have to override the test_membership functions. A couple simple derived examples such as Knapsack and Cardinality constraints are implemented in constraint.hpp.

The CostFunction and Constraint objects are handed to one of the Algorithm objects, which implement the optimization routines to select a (provably near-optimal) subset of elements.

Library of algorithms

Many algorithms, when asked to optimize the cost function with the run_{ALG_NAME}() call, may be handed a flag to instead run the cost-benefit algorithm instead. In the cost-benefit algorithm, the benefit of each Element is divided by its additional cost according to the knapsack constraint. As a result, this specific variant will only run when the Constraint that GreedyAlgorithm or LazyGreedyAlgorithm is provided with is of the specific derived class Knapsack.

Quick testing scripts are given in test_monotone_greedy.cpp and test_non_monotone_greedy.cpp.

Coming soon: non-monotone algorithms, pre-emption (streaming), semi-streaming, and more!