Abdu-Hekal / FReaK

MATLAB repo for Falsification via AutoKoopman
Other
3 stars 0 forks source link

Regret Metric and Reward Maximization #2

Closed EthanJamesLew closed 10 months ago

EthanJamesLew commented 1 year ago

This is a place to park my thoughts on connecting this to multi-armed bandit literature.

Regret

Making regret plots would characterize how good the sampling policy is and would be interesting to see in the paper. Suppose $f: \mathcal X \rightarrow \mathbb R$ is some metric where the predicate $p: \mathcal X \rightarrow \mathbb [ \text{true}, \text{false} ]$ holds when $p(x) = f(x) \ge 0$. For our sample choice $\mathbf x_n$ at iteration $n$, the instantaneous regret if $r_n = f(\mathbf x_n) - f(\mathbf x_n)$. The cumulative regret after $T$ rounds is

$$RT = \sum^{n=1}{T} r_n$$.

A desirable property is no-regret convergence $\lim_{T \rightarrow \infty} R_T / T = 0$.

Sampling Policies

This is speculation

$\epsilon$-greedy: $\epsilon$ of the time, choose a random point, and choose the highest reward for the remaining time.

UCB: Upper Confidence Bound

EthanJamesLew commented 10 months ago

closing for now...