davidADSP / SIMPLE

Selfplay In MultiPlayer Environments
GNU General Public License v3.0
297 stars 103 forks source link

issue when legal actions mask is dependant on current player #39

Open AdamLang96 opened 1 year ago

AdamLang96 commented 1 year ago

I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the legal_actions mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since the legal_actions depend on current_player_num and agent_player_num != current_player_num it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this

  def continue_game(self):
            observation = None
            reward = None
            done = None
            while self.current_player_num != self.agent_player_num:
                action = self.current_agent.choose_action(self, choose_best_action = False, mask_invalid_actions = True)
                observation, reward, done, _ = super(SelfPlayEnv, self).step(action)
                logger.debug(f'Rewards: {reward}')
                logger.debug(f'Done: {done}')
                if done:
                    break

            return observation, reward, done, None
laymelek commented 11 months ago

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

AdamLang96 commented 11 months ago

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

Yeah this is my exact issue. Haven't found a solution yet