count of frames : no of frames to keep in memory, default 4
game type : should the staring pieces be only in the center or on the corners as well, need to define base templates in the code, this cannot be changed until new env object is created
this function is public since this is called during creation of the object
[x] reset
takes no parameters, sets everything to default
initialize the deque for storing frames
initialize the board array here, along with the initial state defined in init
returns the starting state of the board, legal moves and whose turn it is
board contains three planes, one for black coins, one for white, and one denoting which player will play the next turn (all 1s for white and all 0s for black)
this function is public and should be called immediately after the object creation, must also be called when the game has ended and needs to be "reset"
[x] step
takes action as the input
returns reward, next state, an information dict, whether the game has ended, legal moves for next state
is public, and is called whenever an action needs to be taken in the game
has the logic to perform the move and update the board accordingly
can have private functions to handle a variety of logic for updating the board
[x] simulation environment
a separate environment can be created with the facility to run simulations such as monte carlo rollouts
this environment will take the current board state and an action as input, and return the next board state, and hence will be memory less
should implement logic similar to the base environment, but can be a separate inherited class
[x] a game class (separate from environment)
a better idea could be to create a game class that handle all the aspects of game, like remembering what player to play, keeps track of points, history of board states etc
a separate light environment will be created that encompasses the rules, and given a board, which player to play and action, executes all of it
Added simple memoryless environment with init, reset and step functions 77f2e0b9ee9d5fb759b3c2bc9bd06fc036bcb8b0
To do
[x] check logic in terminal conditions
[x] add logic for situations like passing of turn, termination
Added bitboard environment c3f83a4e0e57cacc528abe601fb7f292dafb492c
To do
[x] optimize runtime of bitboard environment
partial optimization in 8db56f771685b510db3cb0a8855df428e335e847 and slightly more optimization in 0ac8950bcf7729e2c4ac66035563bc48b370820b
Further optimization will require converting the entire code to C which will make interacting with all agents very difficult
[ ] optimize board conversion if possible
[x] add function for board augmentation b1f3fce
since running a game is expensive, an alternative to playing a large number of games is to prepare augmentations of board states, which are simply rotated and mirrored versions of the board (including actions and next states). bitboards support these via bit operations.
[ ] modify the board augmentation function to add boards with inverted colors because inverting colors will change the first player and boards observed subsequently will never happen in reality.
Added simple memoryless environment with init, reset and step functions 77f2e0b9ee9d5fb759b3c2bc9bd06fc036bcb8b0 To do
Added bitboard environment c3f83a4e0e57cacc528abe601fb7f292dafb492c To do
[ ] modify the board augmentation function to add boards with inverted colorsbecause inverting colors will change the first player and boards observed subsequently will never happen in reality.