Separate universally and problem-specifically invalid inputs

thorehusfeldt commented 8 months ago

Some input files are invalid universally (for all problems, certainly all problem in a contest), others for this problem. I suggest to separate these issues.

For instance, a default contest.yaml could contain

invalid_inputs:
  latin-1: Naïve coöperation
  unicode: ¯\\_(ツ)_/¯
  empty: ""
  random: YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d
  newline: "\n"
  not-printable_1: "\x7f"
  not_printable_2: "\xe2\x82\xac"
  bismillah: ﷽

and always generate the corresponding testcases. (It could do so explicitly, so that data/invalid_inputs/latin-1.in is indeed created by bt generate.)

An Icelandic contest would not forbid latin-1 because it wants input files like jökulsá.

Invalid pseudotestcases that are particular to a problem instead belong to generators.yaml. An informative and easy-to-repurpose skeleton could look like this:

invalid_inputs:
  data:
    prose: { in: 2 two }
    leading-zero: { in: 02 2 }
    floating-point: { in: 2.0 2 }
    too-many-tokens: { in: 2 2 2 }
    too-few-tokens: { in: "2" }
    extra-line: { in: "2 2\n\n" }
    leading-whitespace: { in: " 2 2" }
    trailing-whitespace: { in: "2 2 " }
invalid_outputs:
  data:
    orwell:
      in: 2 2 # valid input
      ans: 4  # well-formed default answer
      out: 5  # wrong output
invalid_answers:
  data:
    trailing_whitespace:
      in: 2 2
      ans: "4 "

Putting the above list into the skeleton would incentivise problem authors to provide those invalid inputs.

RagnarGrootKoerkamp commented 8 months ago

I think stress testing the input and output validators sense, and this is a reasonable way to do it.

One thought: in NWERC we always use some library (header file), and it feels like all the invalid inputs here are rather tests of the library being used. Once the library is well tested, whitespace and encoding errors should be handled by it and are not really errors specific to the problem.

On the other hand when every problem has it's own non-library-based validator, having extra tests makes a lot of sense.

Another thing: I could imagine things like this are particularly interesting for problems with string input. But then the non-ascii strings should appear in the right place, and not in the first line when that first line contains an integer (the number of strings). But maybe then one should just write problem-specific invalid input.

I think my preference is to not copy them to data/ but instead treat them separately, but I'm flexible on this.

thorehusfeldt commented 8 months ago

Your point is a good one. Since the skeleton is contest-specific, and contests typically come with their own validation traditions, the default skeleton should be optimised for how a BAPC jury wants to see their problems. Contests run by Thore (with bespoke validators) exactly can then overwrite skel. I’ll think of another skeleton.

RagnarGrootKoerkamp commented 8 months ago

Right yes, updating the contest skel dir before adding problems is a decent workaround for sure.

thorehusfeldt commented 8 months ago

How about this skeleton:

sample:
  data:
    - '':
        in: "15"
        ans: 3 5
secret:
  data:
    - 'med-random-0': random_composite --lo 1000 --hi 3000 {seed:0}
    - 'med-random-1': random_composite --lo 1000 --hi 3000 {seed:1}
invalid_inputs:
  data:
    prime:
      in: "7"
    out-of-bounds-lo:
      in: "2"
    out-of-bounds-hi:
      in: "65538"
invalid_outputs:
  data:
    wrong_product:
      in: "15"
      ans: 3 5
      out: 2 7
    not_prime:
      in: "30"
      ans: 2 3 5
      out: 5 6
invalid_answers:
  data:
    decreasing:
      in: "15"
      ans: 5 3

I think this is clear enough to deduce a lot of information from, and makes the difference between _answer and _output clear.

Not sure if you want lo and hi or whether you (@RagnarGrootKoerkamp) think the library should handle this.

RagnarGrootKoerkamp commented 8 months ago

This is pretty nice! I see the current skel is already quite verbose anyway also re. generators so having a bit more for invalid is fine. And an explicit example like you did here is preferable over the commented explanation we have currently.

Specific comments:

invalid_answers before invalid_outputs?
How does invalid_outputs interact with default-output-validation? I think we decided in this case it's just the default_output_validator checks the provided .out against the provided .ans. Not very interesting but fine anyway. (And e.g. the precision passed to the default output validator could be tested here.) Here output is sorted so it's deterministic but seeing invalid_outputs make me feel like it's custom output :thinking:
Either way add an explicit comment at the top what the problem actually is.
Change lo and hi cases to be just below 1000/above 3000 for clarity maybe? the 65538 threw me off.
I think it's fine to explicitly check out of bounds in the sample like this, even when using a library.

thorehusfeldt commented 8 months ago

In the hypothetical problem, correct submission output are the prime factors of the input in any order; a custom output validator would have to do this. On the other hand, the .ans files shall be ordered.
I think invalid_outputs is much more understandable and important (and should be part of the spec). Invalid_answers is a very narrow, BAPC-toolish luxury. That's why it put it later. But I agree it looks weird.
hi here is the smallest composite larger than 2**16. But I'll change to 1000001 or something.

RagnarGrootKoerkamp / BAPCtools

Separate universally and problem-specifically invalid inputs #352