google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.27k stars 937 forks source link

Negotiation Game deterministic state #1276

Open Linwenye opened 2 months ago

Linwenye commented 2 months ago

Hi! As for the Negotiation game, I was expecting for the randomness of the initial game state, however, after conducting the code

    game = pyspiel.load_game("negotiation")
    state = game.new_initial_state()
    if state.is_chance_node():
        # Sample a chance event outcome.
        outcomes_with_probs = state.chance_outcomes()
        action_list, prob_list = zip(*outcomes_with_probs)
        action = np.random.choice(action_list, p=prob_list)
        state.apply_action(action)
        print(state)

It always return the same state. Why this happen?

lanctot commented 2 months ago

Ah yes, that's because the game has chance mode kExplicitStochastic, which is explained in a comment here: https://github.com/google-deepmind/open_spiel/blob/77a03df63f6e6ddbf15ccf22e7f1ca5ff841dd2e/open_spiel/spiel.h#L81

Essentially it means that there is a single chance outcome and the randomness is applied when you apply the action (so the RNG is stored internally in the game object).

If you don't pass in the rng_seed parameter, it defaults to -1 (see negotiation.h) and then uses the default one for the Mersenne twister, see here: https://github.com/google-deepmind/open_spiel/blob/77a03df63f6e6ddbf15ccf22e7f1ca5ff841dd2e/open_spiel/games/negotiation/negotiation.cc#L512 (note: the documentation in the header should be updated, I'll fix that).

So it results in the same sequence of samples (after loading the game). Note that you don't need to load the game every time you want to create a new state (and you should only load the game once).

One fix is to set the seed yourself: e.g. pyspiel.load_game("negotiation(rng_seed=3279011)"). If you want the usual explict stochastic chance mode, there is another similar game called "bargaining" which does it using explicit stochastic.