Implementation of StochasticQ in ClippedDoubleQLearning

coax-dev / coax

Modular framework for Reinforcement Learning in python

https://coax.readthedocs.io

MIT License

168 stars 17 forks source link

Implementation of StochasticQ in ClippedDoubleQLearning #18

Closed frederikschubert closed 2 years ago

frederikschubert commented 2 years ago

Rework of https://github.com/coax-dev/coax/pull/9

This PR enables the use of StochasticQ in ClippedDoubleQLearning and SoftClippedDoubleQLearning. It adds example implementations of TD4 (Distributional Twin Delayed DDPG) and DSAC (Distributional Soft Actor Critic) from https://arxiv.org/abs/2004.14547.

frederikschubert commented 2 years ago

I will still need to add some docs for DSAC and TD4 but the main code changes should be good to go! And at some point we might want to capsule stuff like computing the greedy action from a q function into their own method 😅

frederikschubert commented 2 years ago

@KristianHolsheimer I added a short description so the PR should be ready to be merged.