Effective Number of Pokemon as a Convergence Metric

We ideally need a way to start understanding how well our genetic algorithm has converged. While this is straightforward for 1v1 battles, this is much less straightforward for team battles (e.g. 2v2, 6v6).

Part of the problem is that the fitness metrics are not comparable round to round, and do not have an unlimited size that enable us to claim convergence. Bradley-Terry scores by construction are relative to the teams in the model. Even if we do not standardise win ratios, a team that wins 100% of its matches in iteration 2 is not necessarily fitter than a team that wins 80% of its matches in iteration 100 (and indeed is arguably less likely to be).

Instead of focusing on single-team outcomes, it might be useful to focus on a notion of 'genetic diversity'. Since we know which pokemon are S-tier, we should expect the algorithm to converge towards teams that are mostly or entirely made up of S-tier pokemon.

A way of measureing this is the Effective Number of Pokemon.

In political science, the Effective Number of Parties (https://en.wikipedia.org/wiki/Effective_number_of_parties) captures the idea that parties differ in their relative importance in a political system (e.g. the UK parliament had 10 parties following the 2019 GE not counting the speaker, but is widely considered a 2 or 2.5 party system. Its ENP is usually around 2.3-2.5 iirc).

We could use the same metric to understand the effective number of pokemon in each iteration of the genetic algorithm, by counting the % of times a particular pokemon appears, then using these to caclulate the metric.

In general, my best guess is we'd expect the number to slowly converge towards a number slightly larger than the number of S-tier pokemon.

To measure this, we need to return the full list of teams (or more accurately a count of each pokemon in each generation) at each iteration/generation of the algorithm.

alan-turing-institute / p2lab-pokemon

Effective Number of Pokemon as a Convergence Metric #63