Open mattsqerror opened 8 years ago
It might be more efficient and easier in some ways to store memory in separate state, action, reward vectors.
What do you mean by that? There is one memory that is a list of state-actions-reward information. Why would be better to split that and how to sync them up?
"Targets" should be computed by the models themselves, perhaps calling q-learning / sarsa classes to help compute.
Aren't they already? https://github.com/EderSantana/X/blob/master/x/memory.py#L120-L121 Also we can always use the memory callback for a functional modification on the go: https://github.com/EderSantana/X/blob/master/x/memory.py#L118
To implement SARSA with experience replay: