Shortcut ending when all shell players have a zero stack

To speed up training, and avoid unnecessary hand simulation - when the learning agent isn't able to learn any longer - then having the ability to shortcut the end of the episode when all shell players' stacks are down to zero would have a huge impact.

Given a param to enable shortcut ending is true When a hand has ended and a player is designated the winner Then should check that all shell players have stack > 0. And if not then the "step" method should return True for done return arg. And all other reward/observation calculations should be unaffected.

CodeAnimal / neuron_poker

Shortcut ending when all shell players have a zero stack #11