google-deepmind / meltingpot

A suite of test scenarios for multi-agent reinforcement learning.
Apache License 2.0
577 stars 116 forks source link

Observation colour filters request #191

Closed willis-richard closed 8 months ago

willis-richard commented 9 months ago

I understand that you are able to colour an agent's perspective differently, and that this is used so that an agent always sees itself as the same colour.

Would you be able to do this for other observations too? Here is my specific example. For allelopathic harvest, I want to train only one policy to play all the roles. However, half of the agents prefer green and half of the agents prefer blue. I don't think my policy is able to tell which role it is playing. What I would like is a filter that may map blue<->green. This could be applied to the agents that prefer blue. This means that each agent will see the colour that it prefers as green, and it will see as blue a berry that it does not prefer.

duenez commented 9 months ago

Hi,

Yes, in principle this is as simple as passing the correct spriteMap when creating the Avatar objects so that the right colours are used for the players.

However, in general we don't require that an agent plays on both types of roles. We assume the agent "knows" which role it has, and consider the role a property of the agent, not the player. When we sample episodes, we just make sure to sample the right amount of each role. Even at evaluation, the scenarios are run on a population that separates agents by role, so an agent that was trained to prefer, say, green, will be guaranteed to be in that role in the scenario (if such a role is available to the focal population.)

Does that make sense?

jzleibo commented 9 months ago

I understand that you are able to colour an agent's perspective differently, and that this is used so that an agent always sees itself as the same colour.

We are indeed able to set colors independently for each agent's perspective. We configured these settings on a substrate by substrate basis. For the vast majority of substrates it is as you say: that agents see themselves as always having the same color (usually blue). However, Allelopathic Harvest is one of the substrates where we made a different choice. In Allelopathic Harvest all players change color whenever they plant a berry. These color changes apply the same way in everyone's perspective. All players start out grey, when they plant a red berry they turn red for some time or until they eat a berry, after which they turn back to grey. Ditto when they plant a green berry or a blue berry. The idea is that while planting the berry they end up getting berry juice on themselves. It has the effect of making it visible which equilibrium a player is actively supporting by planting. Free riders who eat but never plant tend to spend more time colored grey than any of the other colors.

Would you be able to do this for other observations too? Here is my specific example. For allelopathic harvest, I want to train only one policy to play all the roles. However, half of the agents prefer green and half of the agents prefer blue.

The default for allelopathic harvest has half the players preferring red and half preferring green, none prefer blue. It sounds like you've changed the colors. I'm just pointing that out since it's salient to me. It should not matter.

I don't think my policy is able to tell which role it is playing. What I would like is a filter that may map blue<->green. This could be applied to the agents that prefer blue. This means that each agent will see the colour that it prefers as green, and it will see as blue a berry that it does not prefer.

This should be easy to implement. Edgar explained how to do it with the spriteMap. You can look at some of the other substrates where spriteMap is used for examples to build on.

Note: in allelopathic harvest the roles only affect the rewards. Agents that don't directly observe their rewards have no way to ascertain their role. The roles are basically a "distractor" in allelopathic harvest. The right thing to do to solve the problem is to build a monoculture i.e. plant berries to make the map entirely one color. This is the right thing to do regardless of which berry you pick. It's best not to pick the color one no one likes though of course (this is blue in the default color scheme).

If for some reason you do need agents to have a debug-only observation of their role assignment then one is available and can easily be enabled. All you have to do is look at line 47 of the config, and change it to _ENABLE_DEBUG_OBSERVATIONS = True. Once you have done that then there should be a debug observation available called 'MOST_TASTY_BERRY_ID'. This observation will contain the player's role. To make it visible to an agent you probably also need to add the string 'MOST_TASTY_BERRY_ID' in the list of config.individual_observation_names too (line 960).

willis-richard commented 9 months ago

Thank you for the replies. I understand that a monoculture is optimal, but that the agents have a preference for which berry is to be chosen. Would it be 'cheating' if an agent knew that it preferred a given colour?

For example, if I train a single policy, with the following filter logic:

If role == player_who_likes_red:
  apply a sprite map that swaps red and green
elif role == player_who_likes_green:
  pass

In this case, each agent can be sure that they prefer (what appears to them as) green, and they have prior experience with agents that prefer (what appears to them) as red.

If that is permissible, then I have two tasks:

  1. Configure a sprite map with colour swapping for all pixels in the observation.
  2. Implement role-dependent Avatar creation

It looks as though the substrate daycare is a good example to follow.

jzleibo commented 9 months ago

Sure, well it depends what you want to do. I believe you're not asking about the Melting Pot Challenge (where it would not be permissible to use the preferred berry color as an observation), but that you are rather using the substrate for a different research purpose. In that case it would be fine to use the preferred berry color observation. I believe we used it in some of our earlier papers on allelopathic harvest as well.

Conceptually, I think either setup makes sense. It's prima facie reasonable to model agents that know their own preference. But it's perhaps more neurobiologically plausible to have a model where agents must discover their preferences through living in the world. Either story makes sense and seems consistent with the intention behind the substrate.

duenez commented 8 months ago

Closing as the original point has been addressed. Feel free to reach out again if there are more questions.