An end-to-end training example project on gym environment

Thanks for open sourcing the great library! I believe there are people interested in MuZero and its capacity on Atari games, and want to try it on gym environments. Also, instead of using the env.step() as the dynamic inside recurrent_fn, some people may be interested in using neural network to learn the dynamic. I am one of them, and have written some code to support using mctx library. I also shared an example of end-to-end training on gym CartPole env. Please check my project muax and the cartpole example

google-deepmind / mctx

An end-to-end training example project on gym environment #46