Closed neelr closed 3 years ago
The value head need to only have 1 node as output where yours has 2. It outputs the expected value of the game from the point of view of the current player.
i.e. change to vf = dense(y, 1, batch_norm = False, activation = 'tanh', name='vf')
Thanks, another quick question, how long were the pretrained best_models in the repo trained? Just want to get a bearing for how long it should take for me
It seems like theres an error when I try to use the module with a custom env, that occurs after the first iter:
How could this be caused? I defined my action space as a discrete value with 11 possible, and observation as 2 values with 100 discrete values. I repurpused the Tic Tac Toe model, with a few changes below: