How does stable-baselines work with a multi-agent pettingzoo environment?

❓ Question

I have created a ParallelEnv Pettingzoo custom environment with 600 agents, according to this tutorial. The env is created based on this page ParallelEnv tutorial. I see that 600 different environments are created (one for each agent) and they are somehow concatenated in a vectorized environment. From what I see there is only one neural network created (or two or three depending on the rl algorithm) but I would expect 600 different NNs, as each agent, should supposedly have its own knowledge. But I see there is only one neural network created and its output looks like being addressed to one agent. Can someone explain to me how the policy corresponds to each agent's actions and how this thing works? Thanks in advance!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

How does stable-baselines work with a multi-agent pettingzoo environment? #1878

❓ Question

Checklist