ariasanovsky / azdopt

An implementation of Alpha Zero for discrete optimization problems.
6 stars 0 forks source link

Replace single-path searches with multi-path rollouts #61

Closed ariasanovsky closed 8 months ago

ariasanovsky commented 8 months ago

This addresses several problems:

Objectives:

Learning Loop

Rollout

Node Initialization

Search Tree

Since we are moving to a Vec<StateNode>, we will have an easier time keeping track of $\text{argmin}$ data with indices. For each node $n$ corresponding to some state $s$, we log in StateNode

$$c_T^\ast(s) = (\text{arg})\text{min}\left(c(s'): s'\in V(T[s..])\right)$$

Dominating Sets

We have used the notation $T[s..]$ to indicate the branch of $T$ starting from $s$. We may extend $T$ to its transitively closed reachability digraph $D = \text{Reach}(T)$. We can say that $s'$ dominates $s$ if $(s, s')\in E(D)$ and $c(s')\leq c(s)$, and use the remaining arcs to define the domination sub-digraph $\text{Dom} \subseteq D$. By tracking $c_T(s)$, we dynamically retain a minimal cover of $\text{Dom}$ with minimal overhead.

Interpreting and updating $h_\theta$

Goal: use $h_\theta$ to initialize $g_T(s,\cdot)$ values, but only use $cT^\ast(s)$ when providing updates to $h\theta$

For example, we may let $h_\theta(s, a)$ be:

ariasanovsky commented 8 months ago

We'll start with

First draft

struct SearchTree<P> {
    positions: BTreeMap<P, NonZeroUsize>,
    nodes: Vec<StateNode>,
}

struct StateNode {
    c: f32,
    c_star: f32,
    in_neighborhood: Vec<usize>,
    active_actions: VecDeque<ActionData>,
    exhausted_actions: Vec<ActionData>,
}

struct ActionData {
    a: usize,
    s_prime: Option<NonZeroUsize>,
    g_sa: f32,
}

and then probably migrate to an SoA.

ariasanovsky commented 8 months ago

Multi-path rollouts

ariasanovsky commented 8 months ago

Multi-path rollouts

ariasanovsky commented 8 months ago

20240104_080604.jpg

Sketch for the search reset/cascading updates.