ariasanovsky / azdopt

An implementation of Alpha Zero for discrete optimization problems.
7 stars 0 forks source link

Manage state space exhaustion #31

Closed ariasanovsky closed 1 year ago

ariasanovsky commented 1 year ago

This is an elaboration on a task mentioned in #14

To discourage the tree from revisiting previously visited terminal nodes:

Exhaustion

Better upper estimate

ariasanovsky commented 1 year ago

Creating this issue resolves #29

ariasanovsky commented 1 year ago

After looking at search data, I realized in some spaces that the the agent is incentivized too strongly towards exploration and rarely reaches terminal nodes. The formula $$u(s, a) = \overline{g}^\ast(s, a) + c\cdot p(s, a)\cdot\dfrac{\sqrt{n(s)}}{1+n(s, a)}$$ places incentives on:

but doesn't factor in the depth of the node corresponding to $s$ or whether the paths visiting $(s, a)$ reached terminal nodes. So I am testing different upper estimate functions and adding $u(s, a)$ to be user-specified.

ariasanovsky commented 1 year ago

Moving

to #14