RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

New `select_action` failed when illegal moves not allowed #67

Open jbirkesteen opened 1 year ago

jbirkesteen commented 1 year ago

The new implementation failed when the DirectPolicyAgent chose an illegal column in an environment without illegals allowed. All entries in probs were 0 after the list comprehension. The method was out-commented and replaced with the old approach (random choice) with bbe6c62.
We didn't really figure out what went wrong.

Another suggestion than purely random: We could keep drawing from probs (for instance, a maximum of 5 times) with categorical until it (by chance) draws a legal column. This of course only makes sense before the algorithm converges to having very high probabilities.

RasmusBrostroem commented 1 year ago

Also for HumanPlayer, it is not possible to move the game while choosing a column and it chrashed my program when I tried to choose a column.

jbirkesteen commented 1 year ago

Could the issue be rooted in Env.step()? In this method, we check if a move is legal and then place it no matter what. Just reading it, it seems like it would give an error, but I haven't tested it yet.