Closed frederikschubert closed 2 years ago
I will still need to add some docs for DSAC and TD4 but the main code changes should be good to go! And at some point we might want to capsule stuff like computing the greedy action from a q function into their own method 😅
@KristianHolsheimer I added a short description so the PR should be ready to be merged.
Rework of https://github.com/coax-dev/coax/pull/9
This PR enables the use of
StochasticQ
inClippedDoubleQLearning
andSoftClippedDoubleQLearning
. It adds example implementations of TD4 (Distributional Twin Delayed DDPG) and DSAC (Distributional Soft Actor Critic) from https://arxiv.org/abs/2004.14547.