EderSantana / X

X is a temporary name, but here lies RL
BSD 3-Clause "New" or "Revised" License
40 stars 8 forks source link

SARSA #4

Open mattsqerror opened 8 years ago

mattsqerror commented 8 years ago

To implement SARSA with experience replay:

EderSantana commented 8 years ago

It might be more efficient and easier in some ways to store memory in separate state, action, reward vectors.

What do you mean by that? There is one memory that is a list of state-actions-reward information. Why would be better to split that and how to sync them up?

"Targets" should be computed by the models themselves, perhaps calling q-learning / sarsa classes to help compute.

Aren't they already? https://github.com/EderSantana/X/blob/master/x/memory.py#L120-L121 Also we can always use the memory callback for a functional modification on the go: https://github.com/EderSantana/X/blob/master/x/memory.py#L118