Loading pre-trained agent through code

Fabien-Couthouis commented 4 years ago

Hi,

First, thanks for your work, this is an amazing project!

I am trying to load the PPO pre-trained agent via a Python script but I cannot find anything in the documentation that explain how to do that. I looked at the play_game.py file but it does not helped me. What I am trying to do is something like:

import gfootball.env as football_env
from gfootball.env.players.ppo2_cnn import Player

env = football_env.create_environment(env_name='11_vs_11_stochastic', render=True)
#this line is not working
player_config = {"checkpoint": "11_vs_11_easy_stochastic_v2"} 
#I do not know what to put in env_config
player = Player(player_config=player_config, env_config={}) 

env.reset()
done = False
while not done:
    action = player.take_action()
    observation, reward, done, info = env.step(action)

May I have some clues on how to do this please? Thanks!

AntonRaichuk commented 4 years ago

It depends on what you are trying to do. One option is to use extra_players argument of create_environment.

The other option is indeed something along the lines of what you are doing. What exactly is not working in your setup?

Few notes:

take_action takes observation
"env_config" is not used in ppo2_cnn agent, so it does not matter what you pass
'checkpoint' is supposed to be a path to a saved checkpoint file
you probably also need to pass 'index': 0

Fabien-Couthouis commented 4 years ago

Thanks for the answer, things worked with:

players=["ppo2_cnn:right_players=1,policy=gfootball_impala_cnn,checkpoint=CP_11_vs_11_easy_stochastic_v2"]
env = football_env.create_environment(env_name='11_vs_11_stochastic', render=True,
                                      stacked=False, number_of_left_players_agent_controls=1, extra_players=players)

player_config = {'index': 0, 'left_players': 1, 'right_players': 0,
                 'policy': 'gfootball_impala_cnn', 'stacked': True, 'checkpoint': 'CP_11_vs_11_easy_stochastic_v2'}
agent = Player(player_config, env_config={})

observation = env.reset()
agent.reset()
done = False
while not done:
    action = agent.take_action([observation])
    observation, reward, done, info = env.step(action)

google-research / football

Loading pre-trained agent through code #107