Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.57k stars 408 forks source link

Tutorial [RLlib: DQN for Simple Poker] does not work #1138

Closed eyast closed 10 months ago

eyast commented 10 months ago

Describe the bug

I've followed the tutorial on your page DQN for Simple Poker which is supposed to showcase how to use PettingZoo with Ray Rllib for an AEC environment that includes action masking. But the code will always break at tune.run(). The Exception it returns is: TuneError X - Trials did not complete', [DQN_leduc_holdem During the handling of this exception, another one has occured: ray.rllib.utils.error.UnsupportedSpaceException: Action space Dict('player_0': Discrete(4), 'player_1': Discrete(4)) is not supported for DQN.

I believe the tutorial should be updated

Code example

Verbatim copy pasted code from the webpage above

System info

pettingzoo was installed using pip pettingzoo[classic] then [butterfly] python 3.9.18 on ubuntu 20, pettingzoo 1.24.2

Additional context

No response

Checklist

elliottower commented 10 months ago

Could you try different RLlib versions? It definitely worked with what I last tried locally which I believe was 2.7, we’ve wanted to re enable CI tests to ensure that the RLlib tutorials were tested automatically but there’s been some issues, I can try to take a look and see if it can be added again though if you can’t fix this by doing different RLlib versions

eyast commented 10 months ago

Thanks for your reply. Yes, it doesn't cause an exception when I use Ray 2.7, but now it reports that some illegal moves are taken (I dont know if this is expected behaviour, or if anything in the code is faulty). Could we possibly modify the documentation of the page in the interim maybe? it says that ray[rllib]>=2.7 is required, we could change that to 'ray[rllib]==2.7.0` in the interim? and if you dont mind telling me if illegal moves are expected or not, that would be helpful in my learning journey.

Either way, thanks a lot for your responsiveness, I appreciate it.

elliottower commented 10 months ago

I was going to say illegal moves are expected because it doesn't use action masking (I saw this testing previously) but looking at the code it actually does contain action masking--I think the logic must be messed up somehow. I'm not an expert on RLlib or action masking implementations, so I can't really help here but I can ask another dev to help out.

About the version numbers good catch, I'll make a PR to fix the version number. If you want to duplicate this issue and post it on Ray's repo itself they may be able to better answer why these newer versions are causing issues. I can do the same as well but since you've actually run both of the versions you're much better able to explain things and troubleshoot any potential answers they find.

Achleshwar commented 10 months ago

Hey @eyast Thanks for opening this issue. I am also stuck here - illegal moves being made by the agent. Can you please let me know if you were able to figure this out? Thanks!

eyast commented 10 months ago

Hi @Achleshwar - for sure, I ended up going a bit deeper to explore how actions are masked, and how to 'tell' my neural networks to stay away from them.. here's the code block, If you look at the forward method, you can see how actions are determined through a pass to self.action_embed_model. This will return the logits of all actions, ,but it might also include a high logit for an illegal action, which we process in the next line, by creating an inf_mask vector, which wil treat these 0/1 as follows: taking the logarithm using torch.log(action_mask) will result in -inf for the elements where action_mask is 0, and 0 for the elements where action_mask is 1. However, the code also applies a maximum operation with torch.tensor(torch.finfo(torch.float32).min) to ensure that the minimum representable value of a 32-bit floating-point number is used instead of -inf for the elements where action_mask is 0.

So basically, anything that has a value of 0 in the action mask, will end up having a logit of torch.tensor(torch.finfo(torch.float32).min), which is -3.4028235e+38.

Here's another example as well, where the action mask is crafted using cleaner code: https://clementbm.github.io/project/2023/03/29/reinforcement-learning-connect-four-rllib.html I hope it helps!

Achleshwar commented 10 months ago

@eyast thank you for the detailed response and sharing Clement's blog. This helps!

elliottower commented 9 months ago

Thanks for reminding me about that blog post, I spoke with clement in the past and he was interested in having it added as a proper tutorial or at the very least linked on our website, I will try to add a link to it before the next release so others can see it.

eyast commented 9 months ago

Hey @elliottower - I've also had few updates in the last few weeks. I've:

Nothing shattering in the world of MARL but very important baby steps for anyone who's interested in getting started on their own as well. Happy to turn those into tutorials/blogs/and share my code. If you support this I will submit a PR. Can you point me where to put those files?

elliottower commented 9 months ago

Hey @elliottower - I've also had few updates in the last few weeks. I've:

  • Understood action masking in rllib
  • Experimented with creating Policies and Self play in rllib
  • Built a totally new game/environment in PettingZoo for a nim-style game, which I solved in rllib.

Nothing shattering in the world of MARL but very important baby steps for anyone who's interested in getting started on their own as well. Happy to turn those into tutorials/blogs/and share my code. If you support this I will submit a PR. Can you point me where to put those files?

That’s awesome, I would love some updated tutorials and stuff like that. The custom environment isn’t as easy to integrate as we already have two custom env tutorials, but maybe you could make the tutorial “RLlib on custom pettingzoo environment” because that’s common thing people want to do. If you want to send the files I can make a branch (and give you access) with an initial PR with things formatted the way they should be, then you can add more text and fix any bugs that are found in CI (we test all tutorials in GitHub actions to ensure they work properly)

elliottower commented 9 months ago

Also if you’d like to chat more directly my discord is b3arodactyl and my email is elliot@elliottower.com