EdanToledo / Stoix

🏛️A research-friendly codebase for fast experimentation of single-agent reinforcement learning in JAX • End-to-End JAX RL
Apache License 2.0
238 stars 24 forks source link

[FEATURE] Add stochastic muzero implementation #77

Open ipsec opened 6 months ago

ipsec commented 6 months ago

Add stochastic muzero implementation - paper and the pseudocode

With this improved version of muzero the stoic could be able to train stochastic environments like the 2048 game and poker (leduc poker)

EdanToledo commented 6 months ago

Hey, this is on the roadmap however i dont have any immediate plans to implement this. If you'd like to give it a shot, id be more than happy to review it and assist with development. otherwise, it might be a while until this is implemented.

ipsec commented 6 months ago

Let me try then. I had a little difficult with the loss function. If you could help me in this part would be great.

ipsec commented 6 months ago

@EdanToledo PR #78 created. Like said, I have difficult with the loss function, a good revision is necessary.

EdanToledo commented 5 months ago

Hey, I havent forgotten about this. Sorry its an important PR and will hopefully get to it asap.

ipsec commented 2 months ago

Hey Edan, could I help you in another point to get this implemented?

Regards.

EdanToledo commented 2 months ago

Hey Fernando, I'm sorry about the delay, I just haven't had time to complete something like this. Stochastic MuZero is a non-trivial algorithm that i would need to gain a good understanding of to ensure the algorithm is implemented correctly. Currently, I havent had too much time to do non-priority features. I promise i will get around to this at some point but i really dont have an ETA. Ideally, if there was more contributors and maintainers to this project it would be easier.