Kattis / problemtools

Tools to manage problem packages using the Kattis problem package format.
MIT License
101 stars 70 forks source link

Strange TLE verdicts for PAC submissions - unreported time margin used #223

Closed Tagl closed 1 year ago

Tagl commented 1 year ago

I was getting some really confusing output from verifyproblem, such as:

Slowest AC runtime: 2.244, setting timelim to 4 secs, safety margin to 9 secs
INFO : Test file result: TLE [test case: test case secret/GB/GB-01, CPU: 2.47s @ test case secret/GB/GB-01]

Why is there a separate safety margin for partially accepted solutions? https://github.com/Kattis/problemtools/blob/08495bbf5e351a7111377079145bf1fb916779fa/problemtools/verifyproblem.py#L1306-L1311

The calculation for it also seems risky in general as the time limit can easily exceed the lower safety margin. In this case, the lower margin was set to 1, resulting in TLE for a submissions which ran almost 2 seconds under the time limit of 4. https://github.com/Kattis/problemtools/blob/08495bbf5e351a7111377079145bf1fb916779fa/problemtools/verifyproblem.py#L1401 I think the quickest fix would be

                    timelim_margin_lo = max(timelim, min(int(0.5 + exact_timelim / safety_margin), timelim - 1))

but I also don't really understand the whole point of the lower margin.

Finally, this value is never reported in the output, even with log level set to debug which was the root cause of my confusion. https://github.com/Kattis/problemtools/blob/08495bbf5e351a7111377079145bf1fb916779fa/problemtools/verifyproblem.py#L1411

SuprDewd commented 1 year ago

CC @simonlindholm

simonlindholm commented 1 year ago

For Accepted solutions, verifyproblem ensures through limits.time_multiplier that there is some margin between the worst judge solution and the time limit, so that if a contestant codes a solution similar to the judge solution but with slightly worse constant it will still pass.

For TLE solutions, it similarly ensures there is margin in the other direction through limits.safety_margin: if a TLE solution would pass if optimized by the safety_margin factor, that's worth a warning.

For partially accepted solutions, it tries to do both of these things, by checking that the score/verdict you get from a decreased time limit is the same as that of an increased time limit. Otherwise, it should emit a warning about sensitivity to time limits -- in your case, indicating that a PAC solution running is 2.47s is unsafe to use because if made slower by a factor 2 it would get TLE.

If TLE is given as the actual verdict of the submission that's a bug, this machinery is only meant to serve as a warning.

Tagl commented 1 year ago

@pehrsoderman You may close this since the PR is merged