Farama-Foundation / PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
https://pettingzoo.farama.org
Other
2.6k stars 410 forks source link

[Bug Report] Inconsistency between terminal behavior of Connect4 and Tick Tack Toe #996

Closed natebade closed 1 year ago

natebade commented 1 year ago

Describe the bug

I noticed some strange behavioral inconsistency between Connect4 and Tick Tack Toe:

        next_agent = self._agent_selector.next()

        winner = self.check_for_winner()

        # check if there is a winner
        if winner:
            self.rewards[self.agent_selection] += 1
            self.rewards[next_agent] -= 1
            self.terminations = {i: True for i in self.agents}
        # check if there is a tie
        elif all(x in [1, 2] for x in self.board):
            # once either play wins or there is a draw, game over, both players are done
            self.terminations = {i: True for i in self.agents}
        else:
            # no winner yet
            self.agent_selection = next_agent
        next_agent = self._agent_selector.next()

        if self.board.check_game_over():
            winner = self.board.check_for_winner()

            if winner == -1:
                # tie
                pass
            elif winner == 1:
                # agent 0 won
                self.rewards[self.agents[0]] += 1
                self.rewards[self.agents[1]] -= 1
            else:
                # agent 1 won
                self.rewards[self.agents[1]] += 1
                self.rewards[self.agents[0]] -= 1

            # once either play wins or there is a draw, game over, both players are done
            self.terminations = {i: True for i in self.agents}

        # Switch selection to next agents
        self._cumulative_rewards[self.agent_selection] = 0
        self.agent_selection = next_agent

Code example

### Why is this an issue? 

Many pieces of code that step through the env using a loop that repeats the following steps:

rewards = {"player_0":0, "player_1":0}

while not term:
    env.action(action)
    observation, rew, term, trunc, info = env.last()
    rewards[env.agent_selection] = rew

return(rewards)

In the case of Tic Tac Toe, the code above would return the correct reward, but in the case of Connect4 the losing player would receive no score in the return.

Now, clearly this is not how you should write these loops, instead getting the proper reward for each agent outside the loop when a done is hit. But some systems don't give you a lot of ability to check how their loops are being computed, and since it's an invisible error that completely screws up training it might be good to either make the behavior consistent, or flag last may not get you the score you think it will.



### System info

Installed using pip
Version 1.23.1
Ubuntu
Ubuntu 22.04.2 LTS

### Additional context

You guys are doing great work, thank you for it!

### Checklist

- [X] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/PettingZoo/issues) in the repo
elliottower commented 1 year ago

Thanks for catching this, would you be interested in making a PR? Otherwise I can take a look

natebade commented 1 year ago

Sure, I can make one.

elliottower commented 1 year ago

@natebade any update on this? I can do it real quick if you'd like

natebade commented 1 year ago

Hi Elliot,

Apologies, work got busy! Why don't you go ahead and do it, thanks!

-Nate


From: Elliot Tower @.> Sent: Thursday, July 6, 2023 1:12 PM To: Farama-Foundation/PettingZoo @.> Cc: Nathaniel Bade @.>; Mention @.> Subject: Re: [Farama-Foundation/PettingZoo] [Bug Report] Inconsistency between terminal behavior of Connect4 and Tick Tack Toe (Issue #996)

--CAUTION: This email originated from outside Mobius Logic. Do not click links or open attachments unless you recognize the sender and know the content is safe.--


@natebadehttps://github.com/natebade any update on this? I can do it real quick if you'd like

— Reply to this email directly, view it on GitHubhttps://github.com/Farama-Foundation/PettingZoo/issues/996#issuecomment-1624024808, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAD57AVSEEVXSWTDJCRBUKTXO3WWZANCNFSM6AAAAAAY6P4MRQ. You are receiving this because you were mentioned.Message ID: @.***>

elliottower commented 1 year ago

No worries, appreciate you catching it in the first place regardless.