I have a custom toy game with two distinct game phases. For example in the first phase, actions [0, 1, 2] are valid with minor masking, but actions [3, 4, 5] are the only valid actions in the second phase. The agent seems to improve well based on the first phase of the game, but doesn't seem to be improving much beyond random behavior for the second, which is a slightly more complicated part of the game.

So far, I've used the RNaD implementation as it is from the repo, but I was wondering if I could reach better results for the second phase by adding in a second policy head (since the default only has one).

Currently, I'm training the agent by simply providing the phase of the game as one of the inputs to the model (0 for 1st, and 1 for 2nd phase), and masking all of the actions unrelated to the current phase.

Will this approach lead to a well performing model by only using one policy head for both phases of the game?

What types of adjustments would I be looking at for implementing a second policy head? I'm not sure which parts of the code base would need to be configured to make this work. I would appreciate any help with this.

1116 is similar to this, but I don't think it provides clear guidance on implementation.

google-deepmind / open_spiel

RNaD - Multiple policy heads implementation #1249

1116 is similar to this, but I don't think it provides clear guidance on implementation.