To speed up training, and avoid unnecessary hand simulation - when the learning agent isn't able to learn any longer - then having the ability to shortcut the end of the episode when all shell players' stacks are down to zero would have a huge impact.
Given a param to enable shortcut ending is true
When a hand has ended and a player is designated the winner
Then should check that all shell players have stack > 0.
And if not then the "step" method should return True for done return arg.
And all other reward/observation calculations should be unaffected.
One way to achieve this is when end_game_check is performed, check for all shell player idxs, if all player_status[shell_player_idx] == False then set done = True
To speed up training, and avoid unnecessary hand simulation - when the learning agent isn't able to learn any longer - then having the ability to shortcut the end of the episode when all shell players' stacks are down to zero would have a huge impact.
Given a param to enable shortcut ending is true When a hand has ended and a player is designated the winner Then should check that all shell players have stack > 0. And if not then the "step" method should return
True
fordone
return arg. And all other reward/observation calculations should be unaffected.