google-deepmind / meltingpot

A suite of test scenarios for multi-agent reinforcement learning.
Apache License 2.0
577 stars 116 forks source link

Collaborative cooking pseudoreward #161

Closed mgerstgrasser closed 1 year ago

mgerstgrasser commented 1 year ago

As mentioned in #39 it is difficult to get learning off the ground in collaborative cooking without pseudorewards. I see how to set the cooking pot pseudoreward to a non-zero value by editing the default config: https://github.com/deepmind/meltingpot/blob/f7905f0d41e4c351913c8b9af446ef4aaf12f680/meltingpot/configs/substrates/collaborative_cooking.py#L908

Two questions though:

  1. Is it possible to set this value without editing files inside the meltingpot package, e.g. can I pass it in through an environment configuration dictionary or similar?
  2. Is it possible to change this value after the environment has been created? That would be useful for instance to implement curriculum learning using a callback from rllib.

Thanks so much!

duenez commented 1 year ago

TL;DR We don't want to expose parameterised configs, but there are easy ways to get what you want.

Unfortunately with the current way we do things, this is not directly possible. The reason being that we have substrate factories that don't forward arguments to individual substrates get_config function. So, we could certainly offer a parameterised collaborative cooking get_config, but it would not be used by the factory.

There's good reason for this level of indirection, though: We want it to be very hard to get the wrong config if you are using our factories.... nothing to configure, so you are very sure you are having the canonical config and all comparisons of results are valid.

If you know what you are doing, we still expose everything out so you can simply use the configs and substrates directly for research purposes.

I suggest you create a custom function that bypasses the factory, and that overwrites the value of the pseudorewards directly. That way, things are completely explicit. For example:

from meltingpot.configs.substrates import collaborative_cooking__ring
from meltingpot import substrate

def get_collab_cooking_with_pseudorewards():
  config = collaborative_cooking__ring.get_config()
  with config.unlocked():
    config.cooking_pot_pseudoreward = 1.0
    config.lab2d_settings_builder = collaborative_cooking__ring.build
  return config

config = get_collab_cooking_with_pseudorewards()
roles = config.default_player_roles
env = substrate.build_from_config(config, roles)

Hope that helps.

mgerstgrasser commented 1 year ago

Yes, that does help, thank you! Is there a way of changing the pseudoreward after the environment has been created?

duenez commented 1 year ago

No, once the environment is created, the game objects that calculate the rewards (and pseudorewards) have been finalised. In principle it would be possible to use the properties library to change this in Lua from Python but I would strongly advise against it.

mgerstgrasser commented 1 year ago

Got it, thank you! And not a huge problem, I think we can just periodically re-create a new environment during training if we want to decay the pseudoreward. Thank you!