google-deepmind / meltingpot

A suite of test scenarios for multi-agent reinforcement learning.
Apache License 2.0
577 stars 116 forks source link

Symmetric Observations #166

Closed mgerstgrasser closed 12 months ago

mgerstgrasser commented 1 year ago

It seems that many (but not all?) substrates have "asymmetric" observations in the sense that e.g. different players control different colors. That means that it's not straightforward to use (a naive form of) parameter sharing, as one player would learn to e.g. maximize the red tiles, and another one the green tiles, etc.

jzleibo commented 1 year ago

We typically shuffle the connection between neural networks and player slots every episode. So all substrates are then symmetric in the way you describe (every neural network learns how to control all avatar colors), except for the small minority which have specific roles. Those are:

  1. predator_prey__*
  2. bach_or_straivinsky_in_the_matrix__*
  3. daycare
  4. fruit_market__concentric_rivers
mgerstgrasser commented 1 year ago

Got it, randomly permuting the player-policy map every episode was my plan B, good to know that that's a sensible idea. Is there a wrapper or setting for that included by any chance?

And presumably there is no way of making the environments symmetric without this shuffling then, is that correct? I have a couple of experiments in mind where that wouldn't work (but I can also find a different environment for those if needed).

jzleibo commented 1 year ago

The pattern we designed around is one where you never know in advance which player slot a particular neural network will connect to, and that the slot assignments could change from episode to episode. You don't even need to have the same number of players from episode to episode.

I think the setup is very general once you consider that you can also define custom roles. I bet that whatever you're trying to do is supported. If you give us a bit more information about what you're trying to do then we could probably help you get it running.

mgerstgrasser commented 1 year ago

Imagine something like having separate roles as in the environments you listed above, or in collaborative_cooking__forced, where you wouldn't want to shuffle agent assignments. But then I'd want to give agent 1 an observation from agent 2 and ask it "what would you do in this situation", and still get a reasonable answer. Admittedly this is a bit of an edge case though. Or I'd be interested in an ablation to see if there's a difference whether an agent can just "overfit" to one particular colour, compared to having to learn to figure out which colour it's controlling in a given episode. Mostly I'm just asking out of curiosity though.

If I just want to do the shuffling, is that already supported out of the box, or should I write my own wrapper for that?

And thank you so much for the prompt responses!

jagapiou commented 1 year ago

Population handles the shuffling currently, see evaluation.py for how it can be used for a focal population. It's always used for scenarios' background populations.

jzleibo commented 1 year ago

Yeah, as John said, take a look at Population to see how we do the shuffling.

As for the other configurations you are talking about. They should all be pretty easy to get working.

something like having separate roles as in the environments you listed above, or in collaborative_cooking__forced, where you wouldn't want to shuffle agent assignments.

I'm not understanding this question. I believe collaborative_cooking__forced is already almost symmetric. The agents all see themselves as the same color. From a quick glance at the code, it looks like the only asymmetry is in how they see the other player in their environment (the player they don't control).

Anyway, you can create custom 'player_0' and 'player_1' roles if you like, and map them to specific avatar ids look at how the roles vector gets used [here](https://github.com/deepmind/meltingpot/blob/dfff4d2784ebab1f9caba299b4099c6092f9fd70/meltingpot/configs/substrates/collaborative_cooking.py#L925, the roles will be shuffled before ending up here).

then I'd want to give agent 1 an observation from agent 2 and ask it "what would you do in this situation",

I think this is a question about your agent training framework, not about Melting Pot substrates. You can take any observation out of the substrate and map it to any agent you like.

Hope this helps!

mgerstgrasser commented 1 year ago

Thank you, both!

For the shuffling: After thinking about it some more, I've implemented this inside RLlib's policy_mapping_fn, since that already defines a map from policies to agents in the environment. That should do the same thing as using Population, right? Or do either of you see something that would go wrong doing it this way?

For the other questions: Oh, interesting, I didn't check the collaborative cooking env in enough detail then, but you're right, this one already does this! I'll have a look at the code in more detail, it seems using the roles I should be able to do this in other environments as well then, e.g. change territory__rooms so that all agents see themselves as red, right?

Thank you so much for all the pointers! Yes, this has been super helpful.

jzleibo commented 1 year ago

For territory__rooms you would have to also change the resources (walls) to be paintable in the symmetric color. That would be a bit more involved to set up. Let me know if you really need that, if so I can walk you through it.