Rebalance segment prioritization/round-robining

Zannick commented 1 year ago

We're now reaching in excess of 3 billion iterations before running out of disk space (~400GiB) for the AV2 test case, with 2B+ states left and 6.6B history entries. However, the max progress level for this test case is 23, but we only made it to progress level 8 around 1B iters, and never to 9 or above. (Previously we were starting from the given routes' thresholds.) The given route will win around progress level 20-21. Future cases will have on the order of hundreds.

Additionally, we pretty quickly reach a point where the lowest progress level (0) still has tons of states but the lowest is much worse score-wise than the minimums in the other levels.

Here's some things we should do:

Refactor run_thread and process_one: because we're not using parallel iterators anymore for consuming from the queue, we can just have each thread handle its own queue consumption method, and make them more lock-efficient by grabbing several at once. (I note that the current behavior is to pop_round_robin and then after each of those, e.g. pop_max_segment.)
Change evictions around the bottom segment. We could, at some point, evict the whole segment. Also, we could just about always evict everything in that segment with a worse score than the max of the top segment.
```
    |
p 8-|                               ■■■■■■
r   |                               ■■■
o 6-|                              ■■■
g   |                           ■■■■
r 4-|                        ■■■
e   |                      ■■■■
s 2-|                      ■■■
s   |                   ■■■■■■       ■ ■■■■■■
  0-|                              ■■■■■■■■■■■■■
   +------------------------------------------------------------------------------------------
```
Another idea based on this final heap visualization (which is based on elapsed time rather than score): not only is segment 0 mostly worse than the max segment here, but the second group in segment 1 is likely based on those states and also should be safely skippable. Finding a large gap in a segment will probably be too difficult, but we can find that segment 2's max is better than segment 5's min (might also be true of 4 and 5 here). This should mean that it's safe to evict everything lower than segment 2 with a score higher than segment 5's max.
Change balance of thread strategies: it can take awhile to reach a later level of progress; we know for example that we processed over 2.8 billion states at 0 progress, and it would be much less at progress level 8 here. Tracking the iters of each progress level might help illustrate the imbalance, but also the combinatorial explosion kind of makes it so that we would naturally process more at higher progress levels if we were prioritizing based only on elapsed time. So... triangular round-robin? Simply more focus on max progress? Some other scaling?

Zannick commented 1 year ago

This change is somehow remarkably efficient, or is otherwise doing some very strange things. The heap size reported is varying wildly, which could be from threads holding onto more work than they did before, or from states not being iterated on. Further, unlike the previous runs, segment 0 disappears super quickly. It never has a chance to reach the times it was reaching before.

What it could be:

each thread no longer grabs a full round-robin list, so that probably reduces the number of duplicates that might get added.
Greedy mode got removed entirely.
Dependent mode is more balanced.

In terms of relative counts, if we assume round-robin gets 8 items, and each item takes the same amount of time, we'd originally have gotten in 2 round-robin cycles:

2 round robins per standard thread
usually 2 round robins per dependent thread
1 round robin per max/some/half progress thread plus 8 of their specials
some amount on the greedy thread (which did 1 round robin plus 1 greedy iteration which could take much longer).

coming out to around 24 round robins and 56 individual specials, not counting the greedy thread(s). (Total = 248 in 15 threads) Now, though, in those two standard cycles, we'll instead get:

2 round robins per standard thread
about 1 round robin and 8 specials per dependent thread on average
16 of each special

Or around 14 round robins and 144 specials. (Total 256 in 16 threads)

The good news is that it's now getting up to progress 11 very quickly, when 8 was hard to reach before.

Zannick commented 1 year ago

I think this is working pretty well. But sometimes I see early on in the program the queue runs out of low-progress states (according to the status updates) and then regains them somehow. This backtracking shouldn't happen unless a thread got really delayed for some reason while holding onto the earlier ones. But also it doesn't always seem to occur, sometimes we just have no 0-progress states from early on, and sometimes we run through tons of them.

Zannick commented 1 year ago

That bug I just fixed probably explains so much! The weird backtracking, the surprise voids, and even the gradual growth of memory usage!

Zannick commented 1 year ago

We probably still need to add more focus on the highest segment somehow.

Zannick / logic-graph

Rebalance segment prioritization/round-robining #80