Closed drblallo closed 4 months ago
Hi @drblallo,
I don't really understand the question, sorry.
But please note that the PPO implementation only supports the single-agent case: https://github.com/google-deepmind/open_spiel/blob/7bfca5fec2a635d8fea475ad93f65b210748879d/open_spiel/python/pytorch/ppo.py#L21
It was added for a specific use case and was never extended to the multiagent case.
So it has only been used and tested on single-agent settings like Atari or asca best response oracle.
I suspect this addresses your question: basically this code does not address the situation you describe since it was designed for the single agent setting.
Hope this helps.
Closing due to inactivity. Please re-open if you would like to follow up.
I am trying to use PPO (which worked wonderfully out of the box, thank you very much for it) to learn a game that allows the same player to take multiple actions in a row, and depending on which action is performed, the next action may belong to a player or another.
It is not clear how to do so because a agent takes both the quantity of envs and the index, so if i start 5 envs for a game with two PPO agents and then after a action the turn in two of the envs belongs to a different player, i don't have enough envs to pass to any agent.
I have looked around the repo but i have not found any hint that this problem is already solved by some other mechanism. From what i understand the alternatives to solve it are:
Do you have any suggestion about which is the correct way of addressing this issue?
Thank you in advance.