DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

How does stable-baselines work with a multi-agent pettingzoo environment? #1878

Closed AnastasiaPsarou closed 3 months ago

AnastasiaPsarou commented 3 months ago

❓ Question

I have created a ParallelEnv Pettingzoo custom environment with 600 agents, according to this tutorial. The env is created based on this page ParallelEnv tutorial. I see that 600 different environments are created (one for each agent) and they are somehow concatenated in a vectorized environment. From what I see there is only one neural network created (or two or three depending on the rl algorithm) but I would expect 600 different NNs, as each agent, should supposedly have its own knowledge. But I see there is only one neural network created and its output looks like being addressed to one agent. Can someone explain to me how the policy corresponds to each agent's actions and how this thing works? Thanks in advance!

Checklist

araffin commented 3 months ago

Hello, this issue belongs to the Pettingzoo repo.