Partitioning the search space

When a Pareto point P is found, we can partition the search space. One partition has no solutions (otherwise our Pareto point would be dominated by some other solution), and another partition is irrelevant (all solutions are dominated by P). The remaining partitions do not overlap. Since we have boundaries (from #25), we can recurse on the partitions.

For example, with two objectives, the search space can be visualized as a plane. A Pareto point P with metric values (X,Y) (and the problem boundaries) partitions the plane into four quadrants. We ignore the top right and bottom left quadrants, and can recursively partition the other two quadrants.

In the image below, we want to search in the quadrants where (x < X) ∧ (y >= Y) or (x >= X) ∧ (y < Y)

Two dimensional search space

In general, with n objectives, we are left with 2^n - 2 partitions to search in.

Each partition can be packaged as a task, and then given to a thread pool: (Java's ForkJoinPool seems appropriate here.)

Issues:

When do we stop splitting the partitions?
- We could continue indefinitely, until we get an UNSAT call.
- We could stop after a certain number of splits.
The number of tasks grows exponentially, so can we reduce this?
- We could do nothing and just live with having 2^n - 2 partitions each time
- 2- and 3-dimensions are special cases, since most of our problems have no more than three objectives. 2-dimensions only has two new partitions. 3-dimensions has 6 new partitions, but we can pair and merge adjacent ones, so we'll only have three partitions. (This pairing does not work for 4-dimensions, and may or may not for higher dimensions.)
- As the number of dimensions increases, the number of new partitions vastly outweighs the two exclusion zones. So we could just live with merging the exclusion zones into our partitions, but keep their constraints.
- In general, it seems we can create exactly n partitions, but the problem is that there will be overlap. We would then have to use some deduplicator.

For the first attempt, use the simple solutions (split indefinitely, don't worry about an explosion in the number of partitions).

I have an attempt in the amalgam-pgia-threadpool-attempt branch. (This is a scratch branch that may be deleted in the future.) It seems to work for a few test cases. I haven't tried for any more.

There's a lot of overhead from the sheer number of partitions. I tried to limit this number, and it helped a bit, but there's still overhead.

I'm also beginning to wonder if we even need to find the problem boundaries.

An earlier attempt with the ForkJoinPool is in the amalgam-pgia-forkjoin-attempt branch. (Again, scratch branch that may be deleted.) That attempt didn't work, because of some bugs. I can probably fix it now, and ForkJoinPool might provide better performance.

I'm also thinking that it might be better to special case two and three objectives, since those are the common cases. (We can divide the space into two and three partitions, respectively, and limit the number of times we do a split.) For higher dimensions, we can just choose some number of non-overlapping partitions (two, four, or number of cores?) and allow exclusion zones to fall into these partitions.

So my next steps, in no particular order:

Try eliminating the boundary finding
Try fixing the ForkJoinPool and seeing if it's better
Only split into x partitions, instead of 2^n-2 partitions (x is some predetermined number, n is number of objectives)
Run this on our workers, to get more test results

This algorithm is flawed. It seems that 2 dimensions is a special case. One of our three-objective problems generated an incorrect result - a local Pareto point in one partition was actually dominated by a local Pareto point in a different boundary.

The reason this works for 2 dimensions is that a point in the top left (or bottom right) partition can only be dominated by a point in the top right partition, which has been excluded. We do not have this guarantee for higher dimensions. The partitions must be processed in a particular order to guarantee that locally optimal Pareto points are globally optimal.

Here's a new idea for higher dimensions:

Map each partition to a binary string, where the string index represents the metric index, and a 0 is a WorseThan constraint and a 1 is a BetterThanOrEqual constraint. For n dimensions, there will be 2^n partitions.
Draw the n-cube, the graph with binary strings at its vertices and edges only where the strings differ by exactly one digit.
Transform the graph into a directed acyclic graph (DAG), where an edge from u to v means u is a dependency of v. In this case, u is a dependency of v if u has exactly one more 1 than v. The binary string with all 1's has no dependency and is the starting point.
This DAG now represents the order the partitions must be processed in, and which ones can be done in parallel.

Dependency graph

The sketch proof that this works is that a partition can only be "dominated" by some other partition that has a BetterThanOrEqual constraint in at least one objective. Since "dominates" is transitive, we can follow the DAG. After processing a partition, we create exclusion constraints based on the discovered Pareto points and share that with the next partitions.

This proof is obviously not rigorous. After all, we already made a mistake with the original algorithm. However, it seems to work for three dimensions.

In terms of implementation:

I will abandon the idea of having overlapping partitions. The point of this algorithm is to avoid overlap, so the different tasks don't need to communicate with each other.
All this processing to determine the order means it's probably worthwhile to do the split only on the first Pareto point. This also solves the issue of exponential explosion.
For now, special case two (and maybe three) dimensions and fall back for higher dimensions.
I'm still debating whether ForkJoinPool would be better than a regular ThreadPool. ForkJoin handles dependencies nicely, but we wouldn't be taking advantage of the work-stealing or making arbitrarily many splits.
The more complicated solution (that maximizes parallelism) is to compute all the dependencies and schedule all the tasks. A simpler solution (that doesn't maximize parallelism) is to only run some number of tasks at a time (in the image, this would be tasks 6, 5, and 3, and then 1, 2, 4). (Note that the shape is Pascal's triangle - for 3, we have 1-3-3-1, but for 4 we have 1-4-6-4-1. So it's not always n tasks per step.)

Commit https://github.com/TeamAmalgam/kodkod/commit/a545b21d695fc62cb8b41e4aafffdb441ada06b3 is a messy implementation for two objectives. Any other number of objectives will fall back to using IGIA.

I tried this on the longer two-metric queens and rooks problems. Performance ranged from slightly worse to twice as fast.

I'm tempted to split more than once, because sometimes the partitions are still fairly "large," but we don't really know how large the partitions are, or how many Pareto points are in there. And there's still the old problem of exponential partitions if you split too many times. So I don't think we have a good way of knowing when to split. I don't think splitting multiple times is too feasible. Thus, I'm sticking with ThreadPool and abandoning ForkJoinPool.

I'm also wondering how useful the boundary finding is, since it technically isn't required. It adds overhead, and sometimes it helps and sometimes it hurts. I suppose that if #29 helps a lot, then we should keep it.

Next, I plan to hard-code three dimensions, to see how it performs. I might try the simple implementation, and then the fancier, more parallel implementation. After that, I'll try to generalize to higher dimensions, and clean up the code.

Commit https://github.com/TeamAmalgam/kodkod/commit/44927c410555c3ad16dc89a72d9b65d499a2d277 adds support for three objectives. The implementation is very messy and should not be merged into master.

I have some basic test results for two and three objectives. IGIA refers to the IncrementalGuidedImprovementAlgorithm. PBGIA is PartitionedGuidedImprovementAlgorithm with boundary finding. PGIA does not have boundary finding. PGIA+ and PBGIA+ have the more complicated implementation that maximizes parallelism.

Two metrics

            IGIA        PBGIA       PGIA
knapsack25  0m11.134s   0m17.038s   0m11.472s
queens6     0m15.813s   0m19.274s   0m14.290s
queens7     0m42.339s   0m43.445s   0m29.394s
queens8     1m38.695s   1m17.029s   1m20.758s
queens9     3m11.908s   3m42.387s   2m03.822s
rooks6      0m13.208s   0m17.177s   0m08.332s
rooks7      1m22.753s   0m48.746s   0m49.897s
rooks8      3m10.777s   1m09.121s   1m0.0656s

Three metrics

            IGIA        PBGIA       PGIA        PBGIA+      PGIA+
knapsack25  4m53.561s   3m01.346s   3m03.314s   3m05.048s   2m38.413s
queens5     0m17.925s   0m18.601s   0m17.619s   0m16.450s   0m16.758s
queens6     0m18.755s   0m32.606s   0m25.366s   0m29.153s   0m22.890s
queens7     1m39.113s   1m44.295s   1m33.500s   1m29.493s   1m20.863s
queens8     2m24.695s   2m43.044s   2m13.176s   2m00.339s   1m53.553s
rooks5      0m13.893s   0m18.655s   0m12.176s   0m16.088s   0m09.505s
rooks6      2m07.147s   1m28.809s   1m53.799s   1m16.565s   1m43.437s
rooks7      5m55.532s   3m48.480s   3m49.735s   3m32.778s   3m05.536s

Conclusions:

PGIA seems to help for "harder" problems.
Boundary finding sometimes helps, but most of the time it hurts. Unless #29 helps significantly, I want to remove boundary finding.
Smarter scheduling of tasks also helps.

Caveats: These are on our "toy" problems. We will need to test these against real problems, like habitat, apollo, SearchAndRescue, and berkeleydbqualityjournal.

My next step is to rewrite the code so it's cleaner, and so it works for any number of objectives.

Commit https://github.com/TeamAmalgam/kodkod/commit/2989991ee96c0f6aea20eac6703ee670da76d920 implements the algorithm for any number of objectives.

It also uses an improved method of scheduling the tasks. Now, each task keeps track of its children (tasks that depend on it) and what its parents (dependencies) are. So when a task completes, it notifies its children, and when a child knows it's ready, it will schedule itself.

(The problem is that if tasks 6, 5, 3, 1, 2, and 4 are all scheduled at the same time and there are three threads, then 6, 5, and 3 will run. If 6 and 5 complete, then 4 can run, but only if 4 is scheduled before 1 and 2. If 1 and 2 are scheduled first, they will block and not get evicted. So 4 will wait in the queue even though it's ready.)

I'm going to move code over to the amalgam-pgia branch (after fast-forwarding it). Pretty much everything in the amalgam-pgia-threadpool-attempt and amalgam-pgia-forkjoin-attempt will be considered as scratch/incomplete work and will be deleted.

TeamAmalgam / kodkod

Partitioning the search space #32