Issue 25: Extend rllib examples to save, load, and render bots

willis-richard commented 2 years ago

This patch uses ray.tune to save the trained models from self_play_train.py and allows you to view their behaviour in a pygame instance by running view_model.py.

Disclaimer: I am only ~1 month familiar with RLLib and Melting Pot.

Note: It is unfortunate that I needed to pass RayModelPolicy a full rllib.agents.trainer.Trainer instance, rather than just a rllib.policy.policy.Policy. This is because rllib.agents.trainer does some pre-processing of the inputs before forwarding to the policy (see lines 1452-1457 in Ray 1.11.0).

Also, although against the spirit of the library, I wanted to demonstrate that the RayModelPolicy could play against the PermissiveModelPolicy. I was not able to do this however, as replacing some of the bots in view_model.py with bots created via: bot_factory.build(bot_factory.get_config("ah3gs_bot_finding_berry_two_the_most_tasty_0")) Resulted in errors about the observations that they received. Therefore rllib bots and existing meltingpots cannot be mixed at the moment.

Finally, would you be open to answering some questions I have about use of this repo? I intend to make some modifications to meltingpot for my experiments and I would appreciate it if I could run my proposed changes by you.

duenez commented 2 years ago

Regarding using our saved model bots for evaluation, it would work if you instead just instantiate an scenario rather than the substrate. Then the appropriate wrappers would be put around the bots, and you'd be able to get an environment that you only connect your RLLib policies to. Then you can just render the WORLD.RGB which would show all players. Makes sense?

willis-richard commented 2 years ago

Thank you for the pointer. I think it makes sense, though I will leave this as future work for now. (First I need successfully train some bots!)

jzleibo commented 2 years ago

Hi Richard!

On Tue, May 3, 2022 at 11:38 AM Richard Willis @.***> wrote:

This patch uses ray.tune to save the trained models from self_play_train.py and allows you to view their behaviour in a pygame instance by running view_model.py.

Disclaimer: I am only ~1 month familiar with RLLib and Melting Pot.

Note: It is unfortunate that I needed to pass RayModelPolicy a full rllib.agents.trainer.Trainer instance, rather than just a rllib.policy.policy.Policy. This is because rllib.agents.trainer does some pre-processing of the inputs before forwarding to the policy (see lines 1452-1457 in Ray 1.11.0).

Also, although against the spirit of the library, I wanted to demonstrate that the RayModelPolicy could play against the PermissiveModelPolicy. I was not able to do this however, as replacing some of the bots in view_model.py with bots created via:

bot_factory.build(bot_factory.get_config("ah3gs_bot_finding_berry_two_the_most_tasty_0")) Resulted in errors about the observations that they received. Therefore rllib bots and existing meltingpots cannot be mixed at the moment.

Finally, would you be open to answering some questions I have about use of this repo? I intend to make some modifications to meltingpot for my experiments and I would appreciate it if I could run my proposed changes by you.

I somehow lost track of this thread -- I remembered that someone had said they were planning to make some modifications to meltingpot for their experiments and wanted to run the changes by us. And I was even actively searching around for this message thread to reply before, but I couldn't for the life of me find it then.! Concluded I must have hallucinated the original message.. Anyway, I've found it now :). Let me know if you still want to chat sometime. I'm happy to schedule a call.

On Tue, May 3, 2022 at 11:38 AM Richard Willis @.***> wrote:

This patch uses ray.tune to save the trained models from self_play_train.py and allows you to view their behaviour in a pygame instance by running view_model.py.

Disclaimer: I am only ~1 month familiar with RLLib and Melting Pot.

Note: It is unfortunate that I needed to pass RayModelPolicy a full rllib.agents.trainer.Trainer instance, rather than just a rllib.policy.policy.Policy. This is because rllib.agents.trainer does some pre-processing of the inputs before forwarding to the policy (see lines 1452-1457 in Ray 1.11.0).

Also, although against the spirit of the library, I wanted to demonstrate that the RayModelPolicy could play against the PermissiveModelPolicy. I was not able to do this however, as replacing some of the bots in view_model.py with bots created via:

bot_factory.build(bot_factory.get_config("ah3gs_bot_finding_berry_two_the_most_tasty_0")) Resulted in errors about the observations that they received. Therefore rllib bots and existing meltingpots cannot be mixed at the moment.

Finally, would you be open to answering some questions I have about use of this repo? I intend to make some modifications to meltingpot for my experiments and I would appreciate it if I could run my proposed changes by you.

You can view, comment on, or merge this pull request online at:

https://github.com/deepmind/meltingpot/pull/31 Commit Summary

f9dfc16 https://github.com/deepmind/meltingpot/pull/31/commits/f9dfc16971d5f38f232f6bd020041da7ba58bf1e Issue 25: Extend rllib examples to save, load, and render bots

File Changes

(4 files https://github.com/deepmind/meltingpot/pull/31/files)

M examples/rllib/self_play_train.py https://github.com/deepmind/meltingpot/pull/31/files#diff-c80f0cabe6240e1cb97d07f4b9ce809de98cbedde81fd9681e926dad303f56ad (53)

R examples/rllib/utils.py https://github.com/deepmind/meltingpot/pull/31/files#diff-cf78c5db8de0c3b77884016a2fec83a3082f3686161243746af8d5402950fd61 (71)

A examples/rllib/view_models.py https://github.com/deepmind/meltingpot/pull/31/files#diff-18e81b4f0161d1eac8566855f961770176ef4ee6ac0e5d46692a024ab0d81675 (100)

M pyproject.toml https://github.com/deepmind/meltingpot/pull/31/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711 (4)

Patch Links:

https://github.com/deepmind/meltingpot/pull/31.patch

https://github.com/deepmind/meltingpot/pull/31.diff

— Reply to this email directly, view it on GitHub https://github.com/deepmind/meltingpot/pull/31, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHFJWS6VX4CHRHQCF77JNLVID64BANCNFSM5U6QPFYA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

willis-richard commented 2 years ago

@jzleibo thank you, that would be great. I shall send an email to the address given in your personal blog.

google-deepmind / meltingpot

Issue 25: Extend rllib examples to save, load, and render bots #31

Finally, would you be open to answering some questions I have about use of this repo? I intend to make some modifications to meltingpot for my experiments and I would appreciate it if I could run my proposed changes by you.