RagnarGrootKoerkamp / BAPCtools

Tools for developing ICPC-style programming contest problems.
GNU General Public License v3.0
48 stars 18 forks source link

Answer validation: allow `.in`-less invalid testcases #335

Closed thorehusfeldt closed 5 months ago

thorehusfeldt commented 5 months ago

I want to be able to have .in-less invalid .ans-testcases, like this: (For a problem whose output is in [0..100])

invalid_answers:
  data:
    range_hi:
      ans: 101 
    range_lo:
      ans: -1

Instead of

invalid_answers:
  data:
    range_hi:
      in: 1 1
      ans: 101
    range_lo:
      in: 1 1
      ans: -1   

The .ctd and .viva Answer Validators already do exactly this, but our definition of AnswerValidator insists on the following invocation:

answer_validator testcase.in [flags] < testcase.ans

even if the answer_validator doesn’t even open the testcase.in.

I think .in-free invalid answer invalidatoin makes sense, is useful, and strictly increases problem quality because it allow me do state that “101 is wrong no matter what the input file is”. (I am not so concerned about saving a line of typing a redundant in: 1 1 in the generator. I’m concerned about the stronger semantics.)

To do this, we must give semantics to ”what it means to run an AnswerValidator on a pseudo-testcase without .in”.

Solution 1

Add --input_oblivious to the specification of AnswerValidator. Those who read .ctd and .viva are always input-oblivious anyway; handwritten AnswerValidators can receive this flag (which means they promise to not open(args[1]).) When bt validate iterates over its validators, it can look for this flag in the source code, much like --constraints.

Solution 2

Non-backwardscompatibly change the invocation of AnswerValidators to always be

answer_validator [--input testcase.in] [flags] < testcase.ans

Then the semantics of validating a pseudotestcase is clear: If there is both .in and .ans, both are sent to the validator, else only one is sent.

There are probably other ways of doing this. One of the difficulties is that the Testcase class is very much tied to in_file.

RagnarGrootKoerkamp commented 5 months ago

Haven't thought this through, but what are the drawbacks of simply calling the answer validator with a path to an empty .in file when it's not provided? (This assumes the user only does this when indeed the answer validator does not read stdin.)

Another question: suppose there are 2 answer validators, of which only one reads the .in. (How) can we distinguish these?

thorehusfeldt commented 5 months ago

There could be two answer validators, one reads the in input file (maybe it’s the custom output validator), the other doesn’t (it just checks output format, much like what a .ctd-based validator would do.) That’s exactly the issue.

bt validate needs to understand which validators to call for the pseudotestcase bad_format.ans. In particular, if it calls the custom output validator, that will now crash (because it tries to open a non-existing file). Of course, we now do distinguish exit code 1 from 43, so this would actually work, but I dislike putting semantics on crashing programmes for a normal use-case. So validate.py needs to be able so see from the outside which answer validators it can safely call.

thorehusfeldt commented 5 months ago

After even more thinking: Maybe @RagnarGrootKoerkamp is (almost) right, and we can just pass a non-existing file as the first argument. (Because we need to distinguish the behaviour of a rejecting validator that rejects because the input is empty from a validator that rejects because the answer file is wrong no matter what the input is. Both return 43.)

The semantics ia as follows: pseudotestcase like

not_an_int:
  ans: zero
negative:
  ans: "-1"
out_of_bounds:
  ans: "100"

go through answer validation using /path/to/unopenable as the input. In particular, the invocations are

ans_validator /path/to/unopenable < testcase.ans

and (provided the above all passed) even

output_validator /path/to/unopenable testcase.ans < testcase.ans

But typically, an answer validator (such as a lowly .ctd-validator) has already rejected before the output validator got fired up.

A problem author that wants to support the above invalid testcases should write an answer validator that makes sure to check standard input as much as possible before opening the input file. Like this (for a problem with input N and whose output consists of N may integers):

line = input() # this is the .ans
if not re.match("\d+( \d+)*\n"); # syntax-check
  fail("integers expected")
if not sys.stdin.readline() == "":
 fail("extra output")
for token in line.split():
  check_int(token, lo=0, hi=100)
# Only now should we open input
N = int(open(sys.args[1]).readline())
if not len(line.split()) == N:
 fail(f"{N} many tokens expected")

An author who doesn’t want this constraint imposed on their answer validator wouldn’t be able to invalidate .in-free invalid testcases; basically the situation we are in today. (Except that .ctd does exaclty this, but because of a weird reason.)

I think this works.

thorehusfeldt commented 5 months ago

This seems like a misguided idea to me now. Closing this until I have better things to say.