Closed threewisemonkeys-as closed 4 years ago
Ok, these are two separate issues. I'll create an issue for the issues with Bernoulli Bandits. Are you going to work on these?
Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in deep
, so that they are compatible here?
Ok, these are two separate issues. I'll create an issue for the issues with Bernoulli Bandits. Are you going to work on these?
Yeah, I will work on this. I dont know how to claim the issue though.
Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in
deep
, so that they are compatible here?
Compatible here in what sense?
The policy API would be the same since select_action
, update_params
and learn
are all being implemented here.
Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in
deep
, so that they are compatible here?Compatible here in what sense? The policy API would be the same since
select_action
,update_params
andlearn
are all being implemented here.
Thats great then :)
Resolved in #127
Currently both bandits and policies are included in a single class. By separating these, there can be a standard api which would be useful for deep contextual bandits.
There are also a couple of issues with Bernoulli Bandits
avg_reward
is being updated with a list of rewards from each step instead of the mean