Closed eyast closed 10 months ago
Could you try different RLlib versions? It definitely worked with what I last tried locally which I believe was 2.7, we’ve wanted to re enable CI tests to ensure that the RLlib tutorials were tested automatically but there’s been some issues, I can try to take a look and see if it can be added again though if you can’t fix this by doing different RLlib versions
Thanks for your reply. Yes, it doesn't cause an exception when I use Ray 2.7, but now it reports that some illegal moves are taken (I dont know if this is expected behaviour, or if anything in the code is faulty).
Could we possibly modify the documentation of the page in the interim maybe? it says that ray[rllib]>=2.7
is required, we could change that to 'ray[rllib]==2.7.0` in the interim?
and if you dont mind telling me if illegal moves are expected or not, that would be helpful in my learning journey.
Either way, thanks a lot for your responsiveness, I appreciate it.
I was going to say illegal moves are expected because it doesn't use action masking (I saw this testing previously) but looking at the code it actually does contain action masking--I think the logic must be messed up somehow. I'm not an expert on RLlib or action masking implementations, so I can't really help here but I can ask another dev to help out.
About the version numbers good catch, I'll make a PR to fix the version number. If you want to duplicate this issue and post it on Ray's repo itself they may be able to better answer why these newer versions are causing issues. I can do the same as well but since you've actually run both of the versions you're much better able to explain things and troubleshoot any potential answers they find.
Hey @eyast Thanks for opening this issue. I am also stuck here - illegal moves being made by the agent. Can you please let me know if you were able to figure this out? Thanks!
Hi @Achleshwar - for sure, I ended up going a bit deeper to explore how actions are masked, and how to 'tell' my neural networks to stay away from them.. here's the code block,
If you look at the forward method, you can see how actions are determined through a pass to self.action_embed_model
. This will return the logits of all actions, ,but it might also include a high logit for an illegal action, which we process in the next line, by creating an inf_mask
vector, which wil treat these 0/1 as follows: taking the logarithm using torch.log(action_mask)
will result in -inf for the elements where action_mask is 0, and 0 for the elements where action_mask is 1. However, the code also applies a maximum operation with torch.tensor(torch.finfo(torch.float32).min)
to ensure that the minimum representable value of a 32-bit floating-point number is used instead of -inf for the elements where action_mask is 0.
So basically, anything that has a value of 0 in the action mask, will end up having a logit of torch.tensor(torch.finfo(torch.float32).min)
, which is -3.4028235e+38.
Here's another example as well, where the action mask is crafted using cleaner code: https://clementbm.github.io/project/2023/03/29/reinforcement-learning-connect-four-rllib.html I hope it helps!
@eyast thank you for the detailed response and sharing Clement's blog. This helps!
Thanks for reminding me about that blog post, I spoke with clement in the past and he was interested in having it added as a proper tutorial or at the very least linked on our website, I will try to add a link to it before the next release so others can see it.
Hey @elliottower - I've also had few updates in the last few weeks. I've:
Nothing shattering in the world of MARL but very important baby steps for anyone who's interested in getting started on their own as well. Happy to turn those into tutorials/blogs/and share my code. If you support this I will submit a PR. Can you point me where to put those files?
Hey @elliottower - I've also had few updates in the last few weeks. I've:
- Understood action masking in rllib
- Experimented with creating Policies and Self play in rllib
- Built a totally new game/environment in PettingZoo for a nim-style game, which I solved in rllib.
Nothing shattering in the world of MARL but very important baby steps for anyone who's interested in getting started on their own as well. Happy to turn those into tutorials/blogs/and share my code. If you support this I will submit a PR. Can you point me where to put those files?
That’s awesome, I would love some updated tutorials and stuff like that. The custom environment isn’t as easy to integrate as we already have two custom env tutorials, but maybe you could make the tutorial “RLlib on custom pettingzoo environment” because that’s common thing people want to do. If you want to send the files I can make a branch (and give you access) with an initial PR with things formatted the way they should be, then you can add more text and fix any bugs that are found in CI (we test all tutorials in GitHub actions to ensure they work properly)
Also if you’d like to chat more directly my discord is b3arodactyl and my email is elliot@elliottower.com
Describe the bug
I've followed the tutorial on your page DQN for Simple Poker which is supposed to showcase how to use PettingZoo with Ray Rllib for an AEC environment that includes action masking. But the code will always break at
tune.run()
. The Exception it returns is:TuneError X - Trials did not complete', [DQN_leduc_holdem
During the handling of this exception, another one has occured:ray.rllib.utils.error.UnsupportedSpaceException: Action space Dict('player_0': Discrete(4), 'player_1': Discrete(4)) is not supported for DQN.
I believe the tutorial should be updated
Code example
System info
pettingzoo was installed using pip pettingzoo[classic] then [butterfly] python 3.9.18 on ubuntu 20, pettingzoo 1.24.2
Additional context
No response
Checklist