Open AdamLang96 opened 1 year ago
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Yeah this is my exact issue. Haven't found a solution yet
I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the
legal_actions
mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since thelegal_actions
depend oncurrent_player_num
andagent_player_num != current_player_num
it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this