SARSA missing on GitHub page and API description is incorrect?

chainer / chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.

MIT License

1.16k stars 226 forks source link

SARSA missing on GitHub page and API description is incorrect? #600

Closed MieszkoFerens closed 4 years ago

MieszkoFerens commented 4 years ago

Although SARSA algorithm is implemented in the current version of ChainerRL, it is not mentioned on the GitHub page. Additionaly, in the API the brief description for this algorithm seems to indicate that it is on-policy SARSA not off-policy, as stated there: "This agent learns the Q-function of a behavior policy defined via the given explorer, instead of learning the Q-function of the optimal policy." Looking at the code I don't see why this would be considered off-policy.

Is there a reason why this SARSA is off-policy or is it a mistake?

muupan commented 4 years ago

Although SARSA algorithm is implemented in the current version of ChainerRL, it is not mentioned on the GitHub page.

Right, thanks for pointing it out. I think it might be covered by "DQN (including DoubleDQN etc.)", but I admit it is confusing.

Looking at the code I don't see why this would be considered off-policy.

It is off-policy in a sense that it learns the Q-function of the current behavior policy, defined by the current approximate Q-function and an explorer, from data collected by past behavior policies.

MieszkoFerens commented 4 years ago

Thanks for you answer.

So, does this mean that the "SARSA" implementation which is available in ChainerRL is different from the canonical SARSA algorithm given, for instance, at the RL book by Sutton and Barto, where SARSA is defined as an on-policy method?

muupan commented 4 years ago

So, does this mean that the "SARSA" implementation which is available in ChainerRL is different from the canonical SARSA algorithm given, for instance, at the RL book by Sutton and Barto, where SARSA is defined as an on-policy method?

Correct. It can be considered as a sample-based approximation of Expected SARSA.