Request for Example of AEC API Usage with Agent Masking in Petting Zoo

facebookresearch / BenchMARL

A collection of MARL benchmarks based on TorchRL

https://benchmarl.readthedocs.io/

MIT License

186 stars 20 forks source link

Request for Example of AEC API Usage with Agent Masking in Petting Zoo #76

Open wmn7 opened 2 months ago

wmn7 commented 2 months ago

I've been exploring the BenchMARL library and am impressed with its capabilities and design—great work!

I am currently interested in implementing a multi-agent reinforcement learning scenario using the AEC (Agent-Environment Cycle) API in petting zoo, particularly for environments that require sequential turn-based actions like in a Chess game. In this context, I need to apply masking at the agent level rather than action masking.

Could you provide an example or guidance on how to adapt the AEC API for such a use case? Any examples of AEC API usage with agent masking in a Chess-like environment would be incredibly helpful.

Thank you for your assistance and for the excellent work on BenchMARL.

matteobettini commented 2 months ago

Hello! Thanks for the nice feedback!

BenchMARL does not currently support the AEC turn-based enviornments, but it is something we have on our TODO list! (this is also because they are already available in torchrl) (and I will also make a tutorial in the future on how to train them in torchrl)

If you could convert your AEC env to a Parallel one using the PettingZoo conversion wrapper https://pettingzoo.farama.org/api/wrappers/pz_wrappers/#module-pettingzoo.utils.conversions that would be a naive workaround, but I understand that this is not always possible.

I'll pin this issue and update it when they will become directly compatible.

wmn7 commented 2 months ago

Hello,

Thank you for the information. I've reviewed the implementation of agent masking in the torchrl's PettingZoo wrapper. It seems that setting the state and reward to zero is a straightforward approach to agent masking.

https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/torchrl/envs/libs/pettingzoo.py#L619-L638

However, when I add the agent mask in the env, then all agents are outputting identical actions. I think an example demonstrating proper agent masking with multiagents methods (such as mappo, qmix) would indeed be beneficial.

Best regards.

matteobettini commented 2 months ago

Hello,

What do you mean by all the agents are outputting identical actions? Are you trying to train the environment in BenchMARL or TorchRL?

Yes! A tutorial on training those envs in TorchRL is needed and we are looking into it as it is not straightforward.

A further example of using the turn based env can be like this one from the tests https://github.com/pytorch/rl/blob/6f1c38765f85389f75e259575163fff972173f07/test/test_libs.py#L3211 where we play tic tac toe.

But we have no training example for now.

wmn7 commented 2 months ago

Hello,

I used torchrl for training, and I found that the reward curve converged during training, as shown in the figure below:

2024-04-25_11-09

But during testing, I found that the agents chose the same action.

For example, now the agent is a discrete action (Discrete(2)), and all actions 0 are selected during testing. I found that this is related to ExplorationType. During training, it is ExplorationType.RANDOM, but during testing, it is ExplorationType.MODE.

I'm not sure whether this is a problem with my environment or an algorithm problem, so I hope there is an example of agent mask.

wmn7 commented 2 months ago

Hello,

I've identified the issue: the state representation was not accurately reflecting the environment. After modifying the environment configuration, the problem was resolved.

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

Thank you!

matteobettini commented 2 months ago

However, I still require an example of how to implement an agent mask. In my simulation, which involves managing Connected Autonomous Vehicles (CAVs), there are instances when these vehicles exit the road network. I am considering applying an agent mask to handle these exiting vehicles. Given that the count of CAVs in my environment changes frequently, I'm uncertain whether this method is the best solution.

A simple solution would be to just respawn the out of bounds veichles in a reset state

Alternatively, if you use a mask, you need to return it as part of your step output. Then, after collection, apply it to the data and feed it to the loss after having filtered out the invalid transitions.

This is currently not done in benchmarl and I know we need a tutorial for this in torchrl. As soon as I will get some time I will work on it.

Lastly, I want to express my gratitude for your efforts. Your framework appears to be more debug-friendly compared to RLlib, which is greatly appreciated.

This is the best compliment you could make :) Thanks