Open cgolubi1 opened 6 months ago
Another thought to integrate into the above list: it would be great to have less manual work (ideally none) needed to turn random_ai generated tests into responder tests.
A silly bug: when a game runs to 200 rounds (e.g. Echo vs IIconfused) and is cancelled, RandomAI falls over.
I think what's happening is simply that the python process gets OOM-killed when it's trying to pull the entire game action log into memory and write the final game state.
The goal of replay testing is to throw compute time rather than human time at finding logic bugs / breaking changes in new code, even if the tester doesn't know what kind of bug they're looking for. That would be easier/more reliable if i sanded down some current rough edges in the replay test rig. Specific things i have in mind:
replay_loop
on the replay site. Those should be CLI flags.replay_loop
should have help/usage text describing all the CLI flags./buttonmen/test/src/api/responder99Test.php
from whatever's in the output directory right now, and one that could execute phpunit based on that. No reason to copy-paste those lists of commands while iterating on a replay test.replay_loop
--- that would catch problems like introduction of unmodelled randomization that are rare, but important. Currently, those behaviors are tested only if a particular CLI flag is selected --- instead they should always be tested a small percentage of the time.CustomBM
a fraction of the timereplay_loop
tests by default should be documented for quick reference, so we can easily know what's been tested for a particular branch.replay_loop
that should work better:random_ai
behavior:random_ai
, into a responder test that can be committed to the codebase.