When running a full multi-machine test, each machine runner presently separately runs a full planning cycle, including checking every single item in the input corpus to extract statistics. Things like this need to be done only once at the start of the test, and then looked up by the machines during their plan cycles without recalculation.
The statistics extraction part of the setup is surprisingly slow, especially on large corpora (for a 1386 entry corpus, it takes several minutes)!
This will need a bit of rearchitecting: the plan stage does all of this repetition as it was always designed to be self-contained and stateless, being part of a pipeline. I'm imagining that we'd break 'look up corpus with statistics' off into an interface, and fulfil that with a static table in the tester and a full-on search in the single version.
There may also be various parts of the planning setup that can be extracted into a once-per-machine-at-start run, but I haven't investigated.
When running a full multi-machine test, each machine runner presently separately runs a full planning cycle, including checking every single item in the input corpus to extract statistics. Things like this need to be done only once at the start of the test, and then looked up by the machines during their plan cycles without recalculation.
The statistics extraction part of the setup is surprisingly slow, especially on large corpora (for a 1386 entry corpus, it takes several minutes)!
This will need a bit of rearchitecting: the plan stage does all of this repetition as it was always designed to be self-contained and stateless, being part of a pipeline. I'm imagining that we'd break 'look up corpus with statistics' off into an interface, and fulfil that with a static table in the tester and a full-on search in the single version.
There may also be various parts of the planning setup that can be extracted into a once-per-machine-at-start run, but I haven't investigated.