I apreciate that you tried a population method as well, but make_strategy_evol is not defined so it crashes, nice that you did a tweak of the parameters, it would have been nice if you added some kind of simple crossover.
In tournament selection is just a random the choice of an individual, you should look also at their fitness when selecting them
MinMax
👌
min max with ab pruning does not beat optimal strategy like nomal minmax (which already does some pruning if(val==-1 and player==0):)
RL
👌
Nice that you itialize self._q just in time
Nice that the agent is trained to play first and last
Nice that you give a negative reward to the opponent's mistakes
Hardcoded player
Evolved strategies
cooked["completion"] = sum(o for o in state.rows) / state.total_elements
should always be 1 becausetotal_elements
is defined as:so using the
data["completion"]==1
is useless.make_strategy_evol
is not defined so it crashes, nice that you did a tweak of the parameters, it would have been nice if you added some kind of simple crossover.MinMax
if(val==-1 and player==0):
)RL
self._q
just in time