Closed declanoller closed 5 years ago
It's not as informative to only have a single solve time/avg score, due to randomness. It would be better, for benchmarking, to run an ensemble of the agents, and form a distribution. Even 10 of them would let us get a sense of the spread.
Solved with PR #5
It's not as informative to only have a single solve time/avg score, due to randomness. It would be better, for benchmarking, to run an ensemble of the agents, and form a distribution. Even 10 of them would let us get a sense of the spread.