google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.26k stars 934 forks source link

Using agents in Monte Carlo Tree Search evaluators #896

Closed AntzenSb closed 2 years ago

AntzenSb commented 2 years ago

I am attempting to use a Monte Carlo Tree Search (MCTS) algorithm in mcts.py to determine the policy of an agent, and I want to use an Evaluator that evaluates a game state using the policy of some other agent. Specifically, I would have something similar to RandomRolloutEvaluator, but instead of randomly choosing an action at each step of the rollout, I would select an action depending on a given agent's policy. How would I implement this given that an Agent take TimeSteps but an Evaluator takes State? Should I just load a new game environment for each rollout to only use TimeStep in the Evaluator?

from open_spiel.python import rl_agent
from open_spiel.algorithms import mcts

class AgentRolloutEvaluator(mcts.Evaluator):

    def __init__(self, agents, n_rollouts=10):
        self.agents = agents
        self.n_rollouts = n_rollouts

    def evaluate(self, state):
        """Returns evaluation on a given state for two-player deterministic turn-based game."""
        for _ in range(self.n_rollouts):
            # WORKAROUND 1: Load a new environment with game state set to |state| ?
            curr_state = state.clone()
            returns = None
            while not curr_state.is_terminal():
                current_agent = self.agents[state.current_player()]
                # WORKAROUND 2: create time step from state here ?
                # time_step = ???? state
                agent_output = current_agent.step(time_step, is_evaluation=True)
                chosen_action = agent_output.action.  # randomly chosen based on agent's policy
                curr_state.apply_action(chosen_action)
                result = returns if result is None else result + returns
        return result / self.n_rollouts

    def prior(self, state):
        pass

mcts_bot = mcts.MCTSBot(game, uct_c, max_simulations, AgentRolloutEvaluator(agents))

General Note: I've noticed that OpenSpiel seems to have two ways to represent a game state: (1) pyspiel.State, which can be used for a simple manipulation of gameplay as seen in example.py, and (2) rl_environment.TimeStep, which is used for interacting with an Environment as seen in rl_example.py (as well as many other game examples where agents are trained). There are some minute differences between the two (e.g. TimeStep holds RL-specific info such as the rewards given to agents so far), but they both share the common purpose of holding info about the current game state, and can be used to progress a game step by step (again, see examples referenced above). Is there a general way to convert between these two types of objects?

rezunli96 commented 2 years ago

Hi @AntzenSb. I think you can just wrap up a time_step object by calling the class function of state. For example you can let time_step = TimeStep(observations=state.observation_tensor(), rewards=0, discounts=0, step_type= StepType.MID). Please see rl_environments.py.

Regarding your general note, I think yes the confusion here comes from the notion of state.

So yes there is some discrepancy about the notion of "state" between game-theory and RL community. But what's cool about OpenSpiel is it kind of considers both by using a unified state class that provides access to all these representation! The RL "state" feels more "recurrent" while the game-theory state is more "transient".

AntzenSb commented 2 years ago

Hi @rezunli96 , thanks for the quick response and explanation! I think differentiating between 'state' in game theory and 'state' in RL is an interesting idea although it seems like a lot of the properties traditionally assigned to one can still be applied to the other (e.g. imperfect information also can be modeled in RL contexts) and thus both can be united under one "state" object. But that's just from my current beginner perspective, and I'm also working sort of at an intersection between game theory and RL, so this interfacing between the two concepts of "state" is probably not too common of an issue.

I did try your suggestion for creating a TimeStep, but it unfortunately did not work, as the structure of the information stored in TimeStep.observations is a little more complex. However, after tinkering around with it some more, I did find a way to get it to work. For future reference:

from open_spiel.python import rl_environment

# Let |state| be of type pyspiel.State
player_id = state.current_player()
legal_actions = [state.legal_actions(player_id) for _ in range(num_players)]
info_state = [state.observation_tensor(player_id) for _ in range(num_players)]
step_type = rl_environment.StepType.LAST if state.is_terminal() else rl_environment.StepType.MID

time_step = rl_environment.TimeStep(
    observations={'info_state': info_state, 'legal_actions': legal_actions, 'current_player': player_id},
    rewards=state.rewards(), discounts=[1.0, 1.0], step_type=step_type)

Note that time_step might be missing some info here, such as time_step.observations['serialized_state'], although for my MCTS use case as described above, this above code suffices.

rezunli96 commented 2 years ago

Yes the current fix makes sense. TimeStep is really just a wrapper class which can contain any information your agent need. I guess the problem is solved now:) ?