SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL
https://genrl.readthedocs.io
MIT License
405 stars 58 forks source link

Refactor bandits.py into Policies and Bandits #123

Closed threewisemonkeys-as closed 4 years ago

threewisemonkeys-as commented 4 years ago

Currently both bandits and policies are included in a single class. By separating these, there can be a standard api which would be useful for deep contextual bandits.

There are also a couple of issues with Bernoulli Bandits

  1. The policy q values aren't currently independent with bandit reward probabilities
  2. avg_reward is being updated with a list of rewards from each step instead of the mean
Sharad24 commented 4 years ago

Ok, these are two separate issues. I'll create an issue for the issues with Bernoulli Bandits. Are you going to work on these?

Sharad24 commented 4 years ago

Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in deep, so that they are compatible here?

threewisemonkeys-as commented 4 years ago

Ok, these are two separate issues. I'll create an issue for the issues with Bernoulli Bandits. Are you going to work on these?

Yeah, I will work on this. I dont know how to claim the issue though.

threewisemonkeys-as commented 4 years ago

Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in deep, so that they are compatible here?

Compatible here in what sense? The policy API would be the same since select_action, update_params and learn are all being implemented here.

Sharad24 commented 4 years ago

Also, @threewisemonkeys-as , could you also check what changes we need to make to policy functions we have in deep, so that they are compatible here?

Compatible here in what sense? The policy API would be the same since select_action, update_params and learn are all being implemented here.

Thats great then :)

Sharad24 commented 4 years ago

Resolved in #127