Kaixhin / ACER

Actor-critic with experience replay
MIT License
251 stars 46 forks source link

feed the previous action to lstm #3

Closed jingweiz closed 7 years ago

jingweiz commented 7 years ago

Hey, nice work! One quick question, you mentioned The agent also receives the previous action and reward, but this part is not part of the acer algorithm, but only from the navigation paper right?

Kaixhin commented 7 years ago

Yep I just kind of added it by habit. I don't think it actually makes much of a difference, so I think I'll remove it and keep this as a canonical implementation of ACER (minus the "efficient" trust region bit - still thinking how best to do that).

Kaixhin commented 7 years ago

It still works (kind of, or the same as before at least).

newplot

jingweiz commented 7 years ago

Thanks a lot for the reply :)