Hardcoded floating-point answers should be verified more exactly

mpsijm commented 2 weeks ago

For problem Fractal Area of the BAPC preliminaries 2024, the version used on the 21st of September showed sample answers that were slightly too inexact. Specifically, they differed in the last two digits. The reason here was that the input precision was reduced from 9 to 6 digits, and the sample answer that was hardcoded in generators.yaml was not updated. However, all jury submissions still passed, because the error was smaller than $10^{-7}$.

To prevent this, BAPCtools could perform a more exact match for the hardcoded answers. Specifically, the hardcoded answer should only be allowed to differ in the last digit, and only by one. For Fractal Area, the samples show 9 digits after the 0., so basically, the maximum allowed absolute error for the canonical jury solution on these hardcoded samples should be $10^{-9}$ rather than $10^{-6}$.

To avoid re-inventing the wheel (the wheel here being default_output_validator.cpp, perhaps we could modify the arguments passed to this validator when running the canonical jury solution on a hardcoded .ans file (or any .ans file, to simplify things). The question then becomes how to detect the value of the last significant digit in this answer (in the example above, this "value" is trivially $10^{-9}$ by counting digits, but we should also handle scientific notation correctly).

mzuenni commented 2 weeks ago

I don't understand the first paragraph. Why did decreasing the precision make the hardcoded value wrong/require it to be updated? Or was it just wrong from the beginning and due to the change in precision this was unnoticed?

And my second question is why was it not generated in the first place? Or to be more precise: for problems with fixed output (i.e. those using default_output_validator.cpp) is there ever a need to use hardcoded .ans files? (If not we could just encourage to not use the hardcoded .ans files)

RagnarGrootKoerkamp commented 2 weeks ago

So, before the input was, say 0.333_333_333 (underscores for clarity), and the answer was hardcoded to 0.666_666_666 (because we like showing exactly that number, rather than with more or less decimals). Then, the input was changed to 0.333_333, but the answer was left untouched. Submissions still pass because the difference between 0.666_666 and 0.666_666_666 is small enough, but it's weird that the given answer is not 'fully' correct in all its digits.

So indeed the fix we did is to write a generator that specifically prints the number of digits we want to show in the samples. We could decide that from now on that's what you're supposed to do, and we warn when ans: is given for default-output-validator problems.

Or, we could first 'ignore' given ans: files, run the solution to generate a .ans, and then check that the hardcoded ans: is sufficiently precise, as in, as least both the required precision, and also all its digits are correct (with rounding or truncating for the last).

mzuenni commented 2 weeks ago

Aahhh the input precision was changed not the output precision... now I understand the problem.

RagnarGrootKoerkamp / BAPCtools

Hardcoded floating-point answers should be verified more exactly #403