google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.23k stars 933 forks source link

PPO and selfplay #1193

Closed drblallo closed 4 months ago

drblallo commented 6 months ago

I am trying to use PPO (which worked wonderfully out of the box, thank you very much for it) to learn a game that allows the same player to take multiple actions in a row, and depending on which action is performed, the next action may belong to a player or another.

It is not clear how to do so because a agent takes both the quantity of envs and the index, so if i start 5 envs for a game with two PPO agents and then after a action the turn in two of the envs belongs to a different player, i don't have enough envs to pass to any agent.

I have looked around the repo but i have not found any hint that this problem is already solved by some other mechanism. From what i understand the alternatives to solve it are:

Do you have any suggestion about which is the correct way of addressing this issue?

Thank you in advance.

lanctot commented 6 months ago

Hi @drblallo,

I don't really understand the question, sorry.

But please note that the PPO implementation only supports the single-agent case: https://github.com/google-deepmind/open_spiel/blob/7bfca5fec2a635d8fea475ad93f65b210748879d/open_spiel/python/pytorch/ppo.py#L21

It was added for a specific use case and was never extended to the multiagent case.

So it has only been used and tested on single-agent settings like Atari or asca best response oracle.

I suspect this addresses your question: basically this code does not address the situation you describe since it was designed for the single agent setting.

Hope this helps.

lanctot commented 4 months ago

Closing due to inactivity. Please re-open if you would like to follow up.