Closed natebade closed 1 year ago
Thanks for catching this, would you be interested in making a PR? Otherwise I can take a look
Sure, I can make one.
@natebade any update on this? I can do it real quick if you'd like
Hi Elliot,
Apologies, work got busy! Why don't you go ahead and do it, thanks!
-Nate
From: Elliot Tower @.> Sent: Thursday, July 6, 2023 1:12 PM To: Farama-Foundation/PettingZoo @.> Cc: Nathaniel Bade @.>; Mention @.> Subject: Re: [Farama-Foundation/PettingZoo] [Bug Report] Inconsistency between terminal behavior of Connect4 and Tick Tack Toe (Issue #996)
--CAUTION: This email originated from outside Mobius Logic. Do not click links or open attachments unless you recognize the sender and know the content is safe.--
@natebadehttps://github.com/natebade any update on this? I can do it real quick if you'd like
— Reply to this email directly, view it on GitHubhttps://github.com/Farama-Foundation/PettingZoo/issues/996#issuecomment-1624024808, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAD57AVSEEVXSWTDJCRBUKTXO3WWZANCNFSM6AAAAAAY6P4MRQ. You are receiving this because you were mentioned.Message ID: @.***>
No worries, appreciate you catching it in the first place regardless.
Describe the bug
I noticed some strange behavioral inconsistency between Connect4 and Tick Tack Toe:
env.step
function, it does not updateenv.agent_selection
to the next player:env.step
does update theenv.agent_selection
to the next playerCode example
In the case of Tic Tac Toe, the code above would return the correct reward, but in the case of Connect4 the losing player would receive no score in the return.
Now, clearly this is not how you should write these loops, instead getting the proper reward for each agent outside the loop when a done is hit. But some systems don't give you a lot of ability to check how their loops are being computed, and since it's an invisible error that completely screws up training it might be good to either make the behavior consistent, or flag last may not get you the score you think it will.