Closed rbange closed 1 year ago
Thanks for your valuable comments!
I would like to have small collection of commonly employed network styles in RL like the GaussianNetwork (used in VPG, PPO and SAC) or a Twin Q Network (like in TD3 and SAC). These could then be enhanced with basic structural integrity asserts (output size of mu and sigma layer are identical) or convenience functions (e.g return test or train action from a gaussian network).
Agree. These should be defined in RLCore, so that many network definition part can be simplified.
Im really unhappy with the definition of target networks. At the moment, these networks are commonly defined as NeuralNetworkApproximator including a dedicated optimizer, while they are never directly trained on. Maybe it would make sense to implement a TargetNetwork struct which can be constructed by just passing the original network to it and offers function for e.g. polyak averaging or hard updates (recommended in some MuJoCo environments). I have never seen an implementation in which target networks differ from their source ones...
Yeah, I also feel the same. Define the TargetNetwork
as a NeuralNetworkApproximator
is quite weird. I realized the problem just after implementing DQN. Then I felt that some common networks were needed to handle different cases, just like what you suggest above. However, I didn't know what other kinds of networks look like at that time, so I simply set the optimizer
in NeuralNetworkApproximator
as an optional keyword argument and treat it as a normal model (blame me 😥)
More words on approximators, another two kinds of models are on my head but never get the time to implement:
As said here https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/pull/93#issuecomment-699647922, id like to write down some thoughts regarding the network handling in this framework. Maybe this is also relevant to https://github.com/JuliaReinforcementLearning/ReinforcementLearningCore.jl.
GaussianNetwork
(used in VPG, PPO and SAC) or a Twin Q Network (like in TD3 and SAC). These could then be enhanced with basic structural integrity asserts (output size of mu and sigma layer are identical) or convenience functions (e.g return test or train action from a gaussian network).NeuralNetworkApproximator
including a dedicated optimizer, while they are never directly trained on. Maybe it would make sense to implement aTargetNetwork
struct which can be constructed by just passing the original network to it and offers function for e.g. polyak averaging or hard updates (recommended in some MuJoCo environments). I have never seen an implementation in which target networks differ from their source ones...Im not sure if it would be reasonable to implement these changes in
ReinforcementLearningCore.jl
orReinforcementLearningZoo.jl
as these are very DRL related.Any thoughts on this?