Previously this was only done if the state was partially observable. Therefore the fully-observed version led to in-place modifications of the observations when the state of the environment changed (board in state and board in next_state are would be the same)
Previously this was only done if the state was partially observable. Therefore the fully-observed version led to in-place modifications of the observations when the state of the environment changed (board in state and board in next_state are would be the same)