Closed jingweiz closed 7 years ago
Yep I just kind of added it by habit. I don't think it actually makes much of a difference, so I think I'll remove it and keep this as a canonical implementation of ACER (minus the "efficient" trust region bit - still thinking how best to do that).
It still works (kind of, or the same as before at least).
Thanks a lot for the reply :)
Hey, nice work! One quick question, you mentioned
The agent also receives the previous action and reward
, but this part is not part of the acer algorithm, but only from the navigation paper right?