Change scores to match A*

Zannick commented 1 year ago

The algorithm written here is essentially A with some variants, but the heuristic used is presently based on a notion of progress and a penalty measure, whereas A uses a heuristic based on the estimated time to completion.

Estimating a time to completion may be expensive, but fortunately, we can easily store these in the db much like we currently store our seen states, as a permanent map of Ctx bytes to an i32. We could even guarantee that it's constant per-state and thus we'd never need to change it, even if we find a faster way to reach the state.

Figuring out how to calculate a good estimate is a long topic, but likely to be approachable from a dynamic programming perspective: looking toward future states' estimated times. e.g. the minimum of "distance to a todo location plus the time to visit that plus estimated time thereon". We can prefill some a bit by using greedy searches and caching those in the db, but I have some concerns about speed, given that a) we need to compute this once per state, b) amortized estimates of early rounds put time-per-state around 1.25-1.5ms (for a single thread), vs on the order of 180ms for a full greedy search (and 6s for a greedy minimize!), and c) trips to the db are not going to be all that fast...

Zannick commented 1 year ago

The state-recursive version used too much memory, so we need to find another method, either to calculate the states in reverse or to generate an alternate approximation.

Or, we abandon the algorithm we have and attempt to solve in layers: greedy search can give us an ordered list of locations to check, A* can optimize the route between each, and 2-opt or 3-opt can try modifications to that. (But this assumes the absence of canonical locations and duplicate items...)

...I wrote the above before realizing I forgot to collect the item, so that's why it never managed to finish. Now we have a much more tractable problem to address, which is that we still need to add actions that change position to the distance mapping (otherwise the AV2 graph is disjoint).

Zannick commented 1 year ago

The current AV2 graph is still disconnected, of course, since all movements are one-way (and at present only one action connects the two halves). We'd probably be better off handling the situation by marking that possibility as impossible.

Zannick commented 1 year ago

It's quite likely after all this that it's too slow still, and we decide to use a nonrecursive heuristic... given that we have a list of locations to try, we can just do one at a time. That ought to be O(n^2) and not O(n!).

Zannick commented 1 year ago

Ok, it's slow and apparently still brittle: if I drop the states marked as having no estimate, then we have no states in the queue at all. Likely because we can only estimate from the closest state regardless of whether it works. That can be fixed to pick the first that isn't a bust. But likely we'll still want to try a fastest calculation. The cache hit rate on these is also pretty terrible, but it vastly improves the time spent calculating these.

Zannick commented 1 year ago

Aand it's so bad I get basically no parallelism using more cores.

Zannick commented 1 year ago

It takes a much longer time to get new solutions, but then they all come at once! Unfortunately these are not very good solutions, and the program is performing lots of small evictions. Maybe I'll tweak it to only count the first n of each item needed.

In any case, this is more or less just another score algorithm applied to the context, although cached now. There should be a better approach reverting this to a simple calculation, or otherwise grabbing the order of locations from greedy search and using A* to find the shortest path between the specified locations (and then permuting it as allowed).

Zannick commented 1 year ago

Yeah, this doesn't work. :( The scoring function is so bad that the queue fills up too fast and is constantly evicting elements.

Zannick / logic-graph

Change scores to match A* #59