Simple variations: softmax outputs, no nonlinearities, etc.

There are several slight variations that it would make sense to test. For example, now, for discrete action spaces, I just use an argmax across the outputs. It's possible that a softmax would be more effective for some env's.

Similarly, we have a nonlinearity right now, but it's possible that's not necessary for some env's (see winning agents for LunarLander-v2 and CartPole-v0 here: https://www.declanoller.com/2019/01/25/beating-openai-games-with-neuroevolution-agents-pretty-neat/ ; completely linear).

More broadly: make it so many variations can be tested for each.

declanoller / RWG_benchmarking

Simple variations: softmax outputs, no nonlinearities, etc. #2