google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.23k stars 932 forks source link

PSRO V2 rectified training(using joint) code bug #269

Closed qmaai closed 4 years ago

qmaai commented 4 years ago

In https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/psro_v2/utils.py, function sample_random_tensor_index(probabilities_of_index_tensor) fails when running psro_v2_example.py with --rectified=True flags or any case where sample_joint. The functions could be modified into the following:

`

def sample_random_tensor_index(probabilities_of_index_tensor, shape=None):
  shape = probabilities_of_index_tensor.shape if not shape else shape
  reshaped_probas = probabilities_of_index_tensor.reshape(-1)
  num_strats = len(reshaped_probas)
  chosen_index = random_choice(num_strats, reshaped_probas)
  return np.unravel_index(chosen_index, shape)

def sample_strategy_joint(total_policies, probabilities_of_playing_policies):
  """Samples strategies given joint probabilities.
  Uses independent sampling if probs_are_marginal, and joint sampling otherwise.
  Args:
    total_policies: A list, each element a list of each player's policies.
    probabilities_of_playing_policies: This is a list of play probabilities of
    the joint policies specified by total_policies.
  Returns:
    sampled_policies: A list specifying a single sampled joint strategy.
  """
  shape = tuple([len(ele) for ele in total_policies])
  sampled_index = sample_random_tensor_index(probabilities_of_playing_policies, shape)
  sampled_policies = []
  for player in range(len(sampled_index)):
    ind = sampled_index[player]
    sampled_policies.append(total_policies[player][ind])
  return sampled_policies

`

here an extra shape is passed into sample_random_tensor_index because probabilities_of_index_tensor as a joint is a one-dimensional vector and its shape could not be used for unraveling.

lanctot commented 4 years ago

Thanks @qmaai. @paulfmmuller, any ideas?

PaulFMMuller commented 4 years ago

Hey @qmaai ,

Taking a look into it. Just to minimize time spent searching, do you confirm that this error pops up when running the example and setting --rectifier="rectified" ?

I don't believe it's possible to use joint probabilities for the moment in v2's example ; would you by any chance have been running v1 ? We were planning on ensuring and validating support for joint probabilities in the future, they may therefore not be too robust yet.

qmaai commented 4 years ago

thank you Paul. I was using v2 with "--rectifier=rectified" flags. yup.

lanctot commented 4 years ago

@qmaai did you see Paul's response that joint probabilities are not yet possible in the v2 example?

Shall we close this or is there continued work being done on it? @PaulFMMuller shall we raise an exception informing the user in this case?

PaulFMMuller commented 4 years ago

Hey @qmaai , I ran the example with --rectifier="rectified" and didn't encounter an issue ; did you add any other flag, or do something else ? If so, and it's not supported yet, please don't hesitate to make a contribution :) Otherwise, if there's something I've missed, please don't hesitate to send the entire launch command so I can reproduce the bug.

lanctot commented 4 years ago

Hi @qmaai just wondering if this has been resolved for you? If so I am inclined to close this issue for now. Let us know, thanks.