EdanToledo / Stoix

🏛️A research-friendly codebase for fast experimentation of single-agent reinforcement learning in JAX • End-to-End JAX RL
Apache License 2.0
160 stars 12 forks source link

[FEATURE] Add stochastic muzero implementation #77

Open ipsec opened 2 months ago

ipsec commented 2 months ago

Add stochastic muzero implementation - paper and the pseudocode

With this improved version of muzero the stoic could be able to train stochastic environments like the 2048 game and poker (leduc poker)

EdanToledo commented 2 months ago

Hey, this is on the roadmap however i dont have any immediate plans to implement this. If you'd like to give it a shot, id be more than happy to review it and assist with development. otherwise, it might be a while until this is implemented.

ipsec commented 2 months ago

Let me try then. I had a little difficult with the loss function. If you could help me in this part would be great.

ipsec commented 2 months ago

@EdanToledo PR #78 created. Like said, I have difficult with the loss function, a good revision is necessary.

EdanToledo commented 1 month ago

Hey, I havent forgotten about this. Sorry its an important PR and will hopefully get to it asap.