In the classic games, when an illegal move is taken, the player who takes the move is penalized, and the game ends. This style is taken from games like chess and go, however, perhaps a more appropriate way to handle it for reinforcement learning algorithms is to penalize without terminating the game.
This will make it easier to solve games because play continues even after illegal moves, creating more diverse observations and rewards.
It should also make it easier to reason about reward structure in games like Hanabi, where reward is allocated at many steps as the game progresses.
In the classic games, when an illegal move is taken, the player who takes the move is penalized, and the game ends. This style is taken from games like chess and go, however, perhaps a more appropriate way to handle it for reinforcement learning algorithms is to penalize without terminating the game.
This will make it easier to solve games because play continues even after illegal moves, creating more diverse observations and rewards.
It should also make it easier to reason about reward structure in games like Hanabi, where reward is allocated at many steps as the game progresses.