Closed declanoller closed 5 years ago
There are several slight variations that it would make sense to test. For example, now, for discrete action spaces, I just use an argmax across the outputs. It's possible that a softmax would be more effective for some env's.
Similarly, we have a nonlinearity right now, but it's possible that's not necessary for some env's (see winning agents for LunarLander-v2 and CartPole-v0 here: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/ ; completely linear).
LunarLander-v2
CartPole-v0
More broadly: make it so many variations can be tested for each.
Added with PR #6 , closing.
There are several slight variations that it would make sense to test. For example, now, for discrete action spaces, I just use an argmax across the outputs. It's possible that a softmax would be more effective for some env's.
Similarly, we have a nonlinearity right now, but it's possible that's not necessary for some env's (see winning agents for
LunarLander-v2
andCartPole-v0
here: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/ ; completely linear).More broadly: make it so many variations can be tested for each.