Training against bots of increasing difficulty (curriculum learning)

google-research / football

Check out the new game server:

https://research-football.dev

Apache License 2.0

3.31k stars 1.27k forks source link

Training against bots of increasing difficulty (curriculum learning) #139

Closed yutaizhou closed 4 years ago

yutaizhou commented 4 years ago

Hello,

I would like to utilize curriculum learning to bootstrap self-play in a 5 vs 5 multi-agent setting, by playing against bots of increasing difficulty. I have some questions.

What is the proper way to adjust difficulty?

In gfootball/scenarios, I see files such as 11_vs_11_{difficulty}_stochastic.py, where difficulty is a value in [0,1], redefining a whole new scenario (in turn, whole new environment, since scenario name gets passed into create_environment()) for a given difficulty parameter.

So, in order to do curriculum learning, is it correct to create a new environment (ie. scenario script) for each difficulty level I choose for the curriculum? If so, when I call env.step(), does the gfootball environment simply sample an action from each of the 5 bots on the right side team?

What is the use case of env.players ?

I saw in #107 that you can do something along the lines of directly importing a player (in my use case, a bot of specific difficulty) to use within an environment. If so, how do you create a bot case with a difficulty level and directly use it? And is this route preferred over the method I asked about in question 1?

What is the use case of extra_players?

Is extra_players used exclusively for playing against a checkpoint? What makes the players "extra" such that it warrants a different arg within create_environment() on top of number_of_{left, right}_players_agent_controls? Can I have an example where this argument is absolutely required?

qstanczyk commented 4 years ago

If you go with option 1 you can either create a number of scenario config files that vary by difficulty parameter, or you can propagate configurable difficulty to the scenario builder function (one hacky way is by using environment variable). That way you can have some logic based on recent scores obtained or something that will adjust dynamically scenario complexity. Multiple calls to env.step() execute within the same scenario, so in order to "reload" complexity you would have to re-construct scenario each time. Another option is to change api of the reset call, so that it will take difficulty as a param and then override the complexity in the game engine (but this requires going though a few code layers).
If you want to train against pre-trained agents, not built-in game AI, you can use extra_players param. The difference between number_of_left_players_agent_controls and extra_players is that number_of_left_players_agent_controls defines how many players on the left team are controlled by the agent being trained (ie. env.step(actions) expects a list of actions for these players and returns observations for them). extra_players allows you to control specific players with the use of provided player modules: https://github.com/google-research/football/tree/master/gfootball/env/players - there are hand-crafted players, lazy players, players allowing you to load existing checkpoint... and you can create your own player module. Hence it is up to you depending if you prefer to use extra_players feature or numberof{left, right}_players_agent_controls and then distribute observations to appropriate modules and collect back actions.
See 2

qstanczyk commented 4 years ago

Please reopen if you have further questions.

yutaizhou commented 4 years ago

@qstanczyk Thank you so much for the detailed reply. I think your explanations are very important and should be documented in football/gfootball/doc/. I don't have reopening rights since I am not a collaborator and a collaborator (you) closed the issue. Here are some followups questions (apologize in advance for so many questions!)

Configuring Difficulty

Multiple calls to env.step() execute within the same scenario, so in order to "reload" complexity you would have to re-construct scenario each time. Another option is to change api of the reset call, so that it will take difficulty as a param and then override the complexity in the game engine (but this requires going though a few code layers).

My understanding from this is that there are two options of configuring difficulty. Please let me know if this is correct.

The first option of reloading complexity means a fresh new game of a new difficulty level is started each time I run some training/playing script, and that difficulty persists throughout the runtime. Dynamically changing the difficulty would have to be done on the shell level through env variables for example.
The second option of modifying the API means I can adjust the difficulty level during runtime, but it would have to be in between calls to env.reset()? E.g. Initialize the game with difficulty = 0.1, episode ends, env.reset(difficulty = 0.2), etc. Dynamically changing the difficulty is thus done in the thread level.

Player control

So essentially number_of_left_players_agent_controls and extra_players differ in the granularity of control? The former allows you to specify control for multiple players at once, whereas the latter makes you specify one at a time?
If you use number_of_left_players_agent_controls, how can you tell/specify which players (roles) are being controlled?
And if number_of_left_players_agent_controls < number_of_left_players_total, are the rest controlled by bots (non-lazy)? Since a bot cannot control multiple players like agent can, there would be number_of_left_players_total - number_of_left_players_agent_controls bots for the left team, yes?
For playing against checkpoint agents (e.g. self-play), is it fine to set number_of_{left, right}_players_agent_controls to a constant rather than using number_of_left_players_agent_controls to represent current agent and extra_players to represent old agent to play against?
For my purpose of playing against simple bots (I am on the League Play multiagent track), I can just set number_of_left_players_agent_controls to 4, and my opponent will be a team of bots by default?

Bonus question

I see on the League Server that tournament date has been delayed, presumably due to covid-19. Is there an update on that?

qstanczyk commented 4 years ago

The first option of reloading complexity means a fresh new game of a new difficulty level is started each time I run some training/playing script, and that difficulty persists throughout the runtime. Dynamically changing the difficulty would have to be done on the shell level through env variables for example.

The second option of modifying the API means I can adjust the difficulty level during runtime, but it would have to be in between calls to env.reset()? E.g. Initialize the game with difficulty = 0.1, episode ends, env.reset(difficulty = 0.2), etc. Dynamically changing the difficulty is thus done in the thread level.

In both cases you can do the control from a thread level. You can set environment variables from within the main Python loop. But it doesn't have to be environment variable, could be Python global variable.

Player control

So essentially number_of_left_players_agent_controls and extra_players differ in the granularity of control? The former allows you to specify control for multiple players at once, whereas the latter makes you specify one at a time?

You can specify multiple players for control with both APIs, it's just that number_of_left_players_agent_controls is GYM-like API, extra_players is our API which allows to connect multiple independent agents (for multiagent setup). GYM-like API can't do that as the control loop is on the client side.

If you use number_of_left_players_agent_controls, how can you tell/specify which players (roles) are being controlled?

You don't - controlled player(s) are auto-switched the same way as in other football games.

And if number_of_left_players_agent_controls < number_of_left_players_total, are the rest controlled by bots (non-lazy)?

The rest is controlled by default by built-in AI (assuming you don't use extra_players feature).

Since a bot cannot control multiple players like agent can, there would be number_of_left_players_total - number_of_left_players_agent_controls bots for the left team, yes?

There are 3 types of players - agent (number_of_left_players_agent_controls), bots (extra_players), and built in AI, which is the default (and controlled by the difficulty parameter).

For playing against checkpoint agents (e.g. self-play), is it fine to set number_of_{left, right}_players_agent_controls to a constant rather than using number_of_left_players_agent_controls to represent current agent and extra_players to represent old agent to play against?

I don't understand this question.

For my purpose of playing against simple bots (I am on the League Play multiagent track), I can just set number_of_left_players_agent_controls to 4, and my opponent will be a team of bots by default?

When you play over internet using League server your opponent is selected by the server, you don't have control over it.

Bonus question

I see on the League Server that tournament date has been delayed, presumably due to covid-19. Is there an update on that?

We are in active discussions, hopefully we will be able to provide details in ~2 weeks.

yutaizhou commented 4 years ago

@qstanczyk

You can specify multiple players for control with both APIs, it's just that number_of_left_players_agent_controls is GYM-like API, extra_players is our API which allows to connect multiple independent agents (for multiagent setup). GYM-like API can't do that as the control loop is on the client side.

I see. My team has been setting both number_of_left_players_agent_controls and number_of_left_players_agent_controls = 4. This means that instead of 4 independent learning agents each side playing against each other (8 agents), we are really doing just one learning agent having control over 4 players each side (2 agents), playing against each other?

There are 3 types of players - agent (number_of_left_players_agent_controls), bots (extra_players), and built in AI, which is the default (and controlled by the difficulty parameter).

What is the difference between bots and built-in AI? I have always thought they both referred to rule-based players within the game. Sorry but could you explain the differences between {agent, bot, built-in AI, gamepad/keyboard}? My take: agent is whatever you are training but I am not sure if that includes loaded checkpoint; gamepad/keyboard is manual human input for real-time gameplay; but still confused on built-in AI vs. bot.

I don't understand this question.

Sorry I think I confused myself there. I should have the answer from the questions below.

When you play over internet using League server your opponent is selected by the server, you don't have control over it.

Sorry let me rephrase that question. I was just explaining my goal outside the context of that question. If I set number_of_left_players_agent_controls = 4 during training and do nothing else, then I would have one agent controlling 4 players on my side playing against 4 players each represented by built-in AI (or bot, again i am unsure on definition from above point) ? How would I change this if I wanted 4 independent learning agents on my team rather than one centralized learning agent?

qstanczyk commented 4 years ago

I see. My team has been setting both number_of_left_players_agent_controls and number_of_left_players_agent_controls = 4. This means that instead of 4 independent learning agents each side playing against each other (8 agents), we are really doing just one learning agent having control over 4 players each side (2 agents), playing against each other?

I guess you mean number_of_left_players_agent_controls and number_of_right_players_agent_controls. In a way, yes - you hook up a single "agent" that way which controls 4 players on the left and 4 players on the right. But it's up to implementation details of that agent - if it internally splits those players into two sets and processes them independently then you still have 2 independent agents playing against each other. You could also split things into 8 sets (1 player each) in order to train each player independently.

There are 3 types of players - agent (number_of_left_players_agent_controls), bots (extra_players), and built in AI, which is the default (and controlled by the difficulty parameter).

What is the difference between bots and built-in AI? I have always thought they both referred to rule-based players within the game. Sorry but could you explain the differences between {agent, bot, built-in AI, gamepad/keyboard}? My take: agent is whatever you are training but I am not sure if that includes loaded checkpoint; gamepad/keyboard is manual human input for real-time gameplay; but still confused on built-in AI vs. bot.

Yeah, it might be a bit confusing. Built-in AI is the game's native players' logic. It is implemented inside of the game engine. Bots might not be the best name, but we mean all implementations of players from https://github.com/google-research/football/tree/master/gfootball/env/players (Training) agent is whatever is hooked up to the main GYM api (it executes step()) calls.

Sorry let me rephrase that question. I was just explaining my goal outside the context of that question. If I set number_of_left_players_agent_controls = 4 during training and do nothing else, then I would have one agent controlling 4 players on my side playing against 4 players each represented by built-in AI (or bot, again i am unsure on definition from above point) ? How would I change this if I wanted 4 independent learning agents on my team rather than one centralized learning agent?

Yes, you would play against players controlled by built-in AI. If you want to train 4 independent agents you need to split observations provided by step() call and train 4 independent models or you can specify 4 independent bots you want to train with extra_players param (note you will most likely need to add implementation of those bots you train to https://github.com/google-research/football/tree/master/gfootball/env/players)

yutaizhou commented 4 years ago

I guess you mean number_of_left_players_agent_controls and number_of_right_players_agent_controls. In a way, yes - you hook up a single "agent" that way which controls 4 players on the left and 4 players on the right. But it's up to implementation details of that agent - if it internally splits those players into two sets and processes them independently then you still have 2 independent agents playing against each other. You could also split things into 8 sets (1 player each) in order to train each player independently.

Yes sorry that is what I meant. So in a sense, "agent" in this context is sort of decoupled from the meaning of a learning agent in the traditional AI sense. "Agent" is really an orchestrator, a user-facing API that takes env info and passes them to the players it controls, and takes action input from the players and passes them to the env. The degree of centralization for both acting (joint vs. factored action space) and learning (e.g. param sharing or not) is determined solely from how the orchestrator splits up the info. Could be 8 independent sets for 4 vs 4 w/ decentralized execution for each team, could be 2 for 4 vs 4 w/ centralized execution for each team. What would be the use case for 1 independent set?

Yeah, it might be a bit confusing. Built-in AI is the game's native players' logic. It is implemented inside of the game engine. Bots might not be the best name, but we mean all implementations of players from https://github.com/google-research/football/tree/master/gfootball/env/players (Training) agent is whatever is hooked up to the main GYM api (it executes step()) calls.

That makes sense! Bot was especially confusing as gfootball/env/players/bot.py contains a bunch of logic for computing distance, determining best player to pass the ball to, etc, so i thought it was synonymous with built-in AI. As for built-in AI, since the logic is implemented within the game engine, you wouldn't pass in player-generated actions into env.step() like you do with external bots (including gamepad, learning agent, etc). You would instead pass in actions generated by env.action_space.sample()? In other words, pass in env-generated action, not player-generated.

Additionally, so agent represents whatever is trained AND whatever checkpoint is loaded then?

yutaizhou commented 4 years ago

@qstanczyk I think you accidentally wrote your answer by editing my post and replacing my question with your answer.. do you mind undoing that and pasting your answer as another reply within this thread? Makes it easier to read for others.

You could also have one network and train sequentially with left and then right team (observations are always provided as if the player plays left-to-right)... does this setup makes sense? This is more of a research question.

Yes I agree that the cumulative reward=0 set up is broken. That second set up does make sense, as you are essentially improving both sides at the same rate, but like you said, a lot more research-oriented things can be said about that approach (game theoretic, similar to how one trains a discriminator and generator in turn in GANs)

That is all my questions for now. This thread has been immensely helpful, and I think some of what you answered should be documented, especially regarding the role of extra_players and how bot != built-in AI. Thank you very much!

qstanczyk commented 4 years ago

Response to https://github.com/google-research/football/issues/139#issuecomment-610947752:

The degree of centralization for both acting (joint vs. factored action space) and learning (e.g. param sharing or not) is determined solely from how the orchestrator splits up the info. Could be 8 independent sets for 4 vs 4 w/ decentralized execution for each team, could be 2 for 4 vs 4 w/ centralized execution for each team. What would be the use case for 1 independent set?

Depends on the setup of the network. If you pass all players from both teams into a single network then what is the reward? cumulative reward for both teams is 0 (when one scores, the other looses), so this is broken. You could also have one network and train sequentially with left and then right team (observations are always provided as if the player plays left-to-right)... does this setup makes sense? This is more of a research question.

As for built-in AI, since the logic is implemented within the game engine, you wouldn't pass in player-generated actions into env.step() like you do with external bots (including gamepad, learning agent, etc). You would instead pass in actions generated by env.action_space.sample()? In other words, pass in env-generated action, not player-generated.

built-in AI doesn't accept any actions, it just runs "some" logic to decide what to do and it has nothing to do with the Python-level API. Think of playing FIFA with a computer - you can use keyboard, joystick, play other the network, but you control a single player, while the rest is played by the game.

qstanczyk commented 4 years ago

@qstanczyk I think you accidentally wrote your answer by editing my post and replacing my question with your answer.. do you mind undoing that and pasting your answer as another reply within this thread? Makes it easier to read for others.

Looks like it, reverted.