80% of moves in each iteration are carried out by the latest version of the agent, and 20% of moves are done by a mixture of old agents. This is based on OA5's league, and it's intended to suppress cyclic behaviours. Right now though playing the challengers is really slooow because each challenger requires its own invocation of the net, and because they each only play on a handful of envs the dispatch overhead exceeds the GPU runtime.
Possible fixes:
Use the gemmbatched instruction to do batched matmuls. It lets you pass a pointer array for where the weights are, meaning I should be able to avoid making as many copies of the weights as there are environments.
One limitation of this is that it'll never work for convnets, the equivalent instructions don't exist. Well - now that I think about it, maybe the 'images' I work with are small enough that I could get away with phrasing convs as a big matmul?
Another is that you'll need to write some amp-style voodoo to replace the linear layers in your net with these new gemmbatched layers.
Explore mixing games: rather than running every agent every step, I could run a single, randomly-chosen agent each step? In one sense this'd be really good exploration since it's a mix of strategies; in another sense it'd be really poor because no one strategy is ever played out. But it would be a simple fix, which makes it tempting.
Either way, I realise I need a test suite for the league before I go any further with any of these. What's a probe env for leagues look like?
(This hinges on whether a league is even necessary)
80% of moves in each iteration are carried out by the latest version of the agent, and 20% of moves are done by a mixture of old agents. This is based on OA5's league, and it's intended to suppress cyclic behaviours. Right now though playing the challengers is really slooow because each challenger requires its own invocation of the net, and because they each only play on a handful of envs the dispatch overhead exceeds the GPU runtime.
Possible fixes:
amp
-style voodoo to replace the linear layers in your net with these new gemmbatched layers.Either way, I realise I need a test suite for the league before I go any further with any of these. What's a probe env for leagues look like?