IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Masking illegal actions #390

Open davidADSP opened 4 years ago

davidADSP commented 4 years ago

Is there any interface for masking illegal actions?

Ideally, I'd like the agent network to only apply the softmax over the set of legal moves (which can be calculated as function of the current state) and set all other action probabilities to zero (e.g. you cannot play on top of an existing marker in tic tac toe).

gal-leibovich commented 4 years ago

Hi @davidADSP,

Currently there is no such interface, although this is definitely something that can be useful, and can be implemented with some of the agents. We might want to add such an interface in the future.

davidADSP commented 4 years ago

Ok thanks for the info. I guess at the moment I can try adding an extra list of valid_actions to the state dictionary that's returned by the environment. This list will then be available in the curr_state property of the agent, that's passed on to the choose_action method. The logic for masking actions can then be written in the choose_action method.

Would be great to have a proper interface for this though, most multi-player games have some form of move illegality baked into the environment, that the agent should use at action time to mask its output.