Manage state space exhaustion

ariasanovsky commented 1 year ago

This is an elaboration on a task mentioned in #14

To discourage the tree from revisiting previously visited terminal nodes:

Exhaustion

[x] mark the ActionData of the penultimate node corresponding to the action leading to the terminal node as Exhausted
- this could involve an enum with Active/Exhausted variants to conform to StateData
- or it could entail directly removing the element from the Vec of ActionData
[x] if this leaves the node with 0 actions which are Active, switch the node from Active to Exhausted
- [x] in this case, we may remove the antepenultimate action from the antepenultimate StateData
- [x] propagate further removals in a cascade
- most likely these cascades will not be frequent
- but it is important to avoid revisiting the same terminal node too often
[x] make an issue to create visuals from CostLog or a separate helper struct
- this will help tune hyperparameters by seeing how frequently actions are taken

Better upper estimate

[ ] allow user-specified functions for $u(s, a)$, or equivalent methods for selecting nodes

ariasanovsky commented 1 year ago

Creating this issue resolves #29

ariasanovsky commented 1 year ago

After looking at search data, I realized in some spaces that the the agent is incentivized too strongly towards exploration and rarely reaches terminal nodes. The formula $$u(s, a) = \overline{g}^\ast(s, a) + c\cdot p(s, a)\cdot\dfrac{\sqrt{n(s)}}{1+n(s, a)}$$ places incentives on:

large values of $\overline{g}^\ast(s, a)$ (average gain from visiting $a$ from $s$)
large values of $p(s, a)$ (model prediction for the probability of visiting $a$ from $s$)
small values of $n(s, a)$ (visiting unvisited actions)
less emphasis on $\overline{g}^\ast(s, a)$ when $n(s)$ is large

but doesn't factor in the depth of the node corresponding to $s$ or whether the paths visiting $(s, a)$ reached terminal nodes. So I am testing different upper estimate functions and adding $u(s, a)$ to be user-specified.

ariasanovsky commented 1 year ago

Moving

[ ] allow user-specified functions for $u(s, a)$, or equivalent methods for selecting nodes

to #14

ariasanovsky / azdopt

Manage state space exhaustion #31