abrahamnunes / fitr

Tools for computational psychiatry research.
https://computationalpsychiatry.github.io/fitr
GNU General Public License v3.0
9 stars 2 forks source link

MDP Agents on Bandit Tasks #101

Open abrahamnunes opened 6 years ago

abrahamnunes commented 6 years ago

It is difficult to use an MDP agent on a Bandit task, mainly because of the eligibility trace update.

On a contextual 2 armed bandit task, the final action is $\mathbf u' = (0.5, 0.5)^\top$. The 0.5's are necessary in order to facilitate computation of the target $y_t = r_t - \mathbf u'^\top \mathbf Q \mathbf x'$ such that

equation

However, the eligibility trace is updated as

equation

which in a 4 state (2 context, 2 outcome) task with $\lambda = \gamma = 1$, and where $\mathbf x = (1, 0, 0, 0)^\top$, $\mathbf u = (1, 0)^\top$ and $\mathbf x' = (0, 0, 1, 0)^\top$, should result in a trace that looks like

equation

The current setup will allow either the correct trace or the correct target calculation.

I think the solution may be to separate the trace updating function from the value function updating.

ARudiuk commented 6 years ago

Some of the math seems to not be rendering @abrahamnunes

hardik44fg commented 3 years ago

@abrahamnunes Try to highlight the important words so it will help someone to easily understand