JuliaReinforcementLearning / ReinforcementLearning.jl

A reinforcement learning package for Julia
https://juliareinforcementlearning.org
Other
586 stars 112 forks source link

Unify common network architectures and patterns #139

Closed rbange closed 1 year ago

rbange commented 4 years ago

As said here https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/pull/93#issuecomment-699647922, id like to write down some thoughts regarding the network handling in this framework. Maybe this is also relevant to https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl.

  1. I would like to have small collection of commonly employed network styles in RL like the GaussianNetwork (used in VPG, PPO and SAC) or a Twin Q Network (like in TD3 and SAC). These could then be enhanced with basic structural integrity asserts (output size of mu and sigma layer are identical) or convenience functions (e.g return test or train action from a gaussian network).
  2. Im really unhappy with the definition of target networks. At the moment, these networks are commonly defined as NeuralNetworkApproximator including a dedicated optimizer, while they are never directly trained on. Maybe it would make sense to implement a TargetNetwork struct which can be constructed by just passing the original network to it and offers function for e.g. polyak averaging or hard updates (recommended in some MuJoCo environments). I have never seen an implementation in which target networks differ from their source ones...

Im not sure if it would be reasonable to implement these changes in ReinforcementLearningCore.jl or ReinforcementLearningZoo.jl as these are very DRL related.

Any thoughts on this?

findmyway commented 4 years ago

Thanks for your valuable comments!

I would like to have small collection of commonly employed network styles in RL like the GaussianNetwork (used in VPG, PPO and SAC) or a Twin Q Network (like in TD3 and SAC). These could then be enhanced with basic structural integrity asserts (output size of mu and sigma layer are identical) or convenience functions (e.g return test or train action from a gaussian network).

Agree. These should be defined in RLCore, so that many network definition part can be simplified.

Im really unhappy with the definition of target networks. At the moment, these networks are commonly defined as NeuralNetworkApproximator including a dedicated optimizer, while they are never directly trained on. Maybe it would make sense to implement a TargetNetwork struct which can be constructed by just passing the original network to it and offers function for e.g. polyak averaging or hard updates (recommended in some MuJoCo environments). I have never seen an implementation in which target networks differ from their source ones...

Yeah, I also feel the same. Define the TargetNetwork as a NeuralNetworkApproximator is quite weird. I realized the problem just after implementing DQN. Then I felt that some common networks were needed to handle different cases, just like what you suggest above. However, I didn't know what other kinds of networks look like at that time, so I simply set the optimizer in NeuralNetworkApproximator as an optional keyword argument and treat it as a normal model (blame me 😥)

More words on approximators, another two kinds of models are on my head but never get the time to implement: