google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Apache License 2.0
4.16k stars 917 forks source link

RNaD - Multiple policy heads implementation #1249

Closed frvls closed 3 weeks ago

frvls commented 1 month ago

I have a custom toy game with two distinct game phases. For example in the first phase, actions [0, 1, 2] are valid with minor masking, but actions [3, 4, 5] are the only valid actions in the second phase. The agent seems to improve well based on the first phase of the game, but doesn't seem to be improving much beyond random behavior for the second, which is a slightly more complicated part of the game.

So far, I've used the RNaD implementation as it is from the repo, but I was wondering if I could reach better results for the second phase by adding in a second policy head (since the default only has one).

Currently, I'm training the agent by simply providing the phase of the game as one of the inputs to the model (0 for 1st, and 1 for 2nd phase), and masking all of the actions unrelated to the current phase.

Will this approach lead to a well performing model by only using one policy head for both phases of the game?

What types of adjustments would I be looking at for implementing a second policy head? I'm not sure which parts of the code base would need to be configured to make this work. I would appreciate any help with this.

1116 is similar to this, but I don't think it provides clear guidance on implementation.

lanctot commented 1 month ago

Hi @frvls,

I agree that this separation would likely help.

I'm sorry, but we cannot help with it as we currently don't even have the resources to maintain the current version of R-NaD itself. It'd be great if we had some help from community for this.