Implement fixed Q targets idea, i.e. train towards a static Q function for a bunch of iterations, then copy new weights over and repeat. Note that this will involve changing two things:
Change the ContinuousQLearning<StateType, ActionType> class to use two Q functions, and
Make sure that each child class of ContinuousActionValueFunctor<StateType, ActionType> has a working copy contructor (hopefully the default one is good enough...)
Implement fixed Q targets idea, i.e. train towards a static Q function for a bunch of iterations, then copy new weights over and repeat. Note that this will involve changing two things:
ContinuousQLearning<StateType, ActionType>
class to use two Q functions, andContinuousActionValueFunctor<StateType, ActionType>
has a working copy contructor (hopefully the default one is good enough...)