Open JanMatas opened 7 years ago
Given the goal is to look for good scaling across the spectrum of problem sizes, there is no fixed set of test inputs, as I don't know how good the best solution is going to be.
Instead the approach taken is to give each group a time budget per puzzle. Then puzzle instances of increasing size are executed until the total per-puzzle time budget is exceeded. So there is no upper bound on the scale that will be tested. The spacing of puzzle scales tested is chosen heuristically for each puzzle to give a reasonable spread and resolution of points, while still making it possible for fast implementations to reach large scales (the heuristic sometimes gets adjusted as things proceed and solutions get faster).
The time budget in the auto-tests is also less than in the final tests, just for financial reasons. There are about ~360 puzzles to evaluate across all groups, so allocating 30 seconds per puzzle takes about 21600 CPU seconds, or about 3-4 hours, which is probably about right for intermediate runs. For the final run it will be higher though, as the turnaround time doesn't matter much - maybe 5-10 minutes per puzzle, so 1.25 - 2.5 CPU days.
So trying to optimise for execution times longer than around 5 minutes per puzzle instance is probably not worth it, as it is too expensive in time to evaluate them all in a controlled environment. However, different groups will probably achieve quite different scales within that sort of time budget, in some cases by a couple of orders of magnitude.
Hi,
I was wondering if it is possible to publish approximate sizes of testing inputs or at least some statistics about the test set (median and quantiles), so we know what amount of work we will be dealing with in our algorithms.
Thanks!