RagnarGrootKoerkamp / BAPCtools

Tools for developing ICPC-style programming contest problems.
GNU General Public License v3.0
49 stars 22 forks source link

data/bad: renaming and generation #303

Closed thorehusfeldt closed 7 months ago

thorehusfeldt commented 1 year ago

TODOs for invalid inputs:

  1. Make BAPCtools recognise data/invalid_inputs, continue supporting data/bad but with a warning
  2. Move the files currently created by bt new_problem in data/bad to the generator. This requires changing the standard skeleton, and is not trivial:
    'empty':
    in: '' # BAPCtools forbids this (“Hardcoded .in data may not be empty!”)
    'newline':
    in: "\n" # this is fine with those quotes
    'random_printable': 
    in: YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d # also fine
    'unicode':
    in: "¯\\_(ツ)_/¯" # fine with the extra backslash and those quotes
    'not_printable':
    in: ... # now what? I’m stumped

    What is worrisome is that parts of the toolchain (in particular, python’s YAML parser) may choke on the non-printable unicode shenanigans before it ever gets to write the .in file.

mzuenni commented 1 year ago

It is probably easier to add a generator which produces these

#!/usr/bin/env python3

import sys

strings = {
    'empty': b'',
    'newline' : b'\n',
    'random_printable' : b'YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d\n',
    'unicode' : bytes('¯\\_(ツ)_/¯\n', 'utf-8'),
    'not_printable1' : b'\x7f\n',
    'not_printable2' : b'\xe2\x82\xac\n'
}

sys.stdout.buffer.write(strings[sys.argv[1]])
thorehusfeldt commented 1 year ago

That would work, but then we’re back at a mandatory stdout.py-like generator that practically always has to be there, and a lot of redundancy. This seems like a step backwards.

I’d prefer to either have some good, explicit invalid_inputs in the skeleton-provided generator or move this to the behind-the-curtains sanity check.

I think something like the following works for me:

data:
  bad:
    data:
      'newline':
        in: "\n"
      'random_printable': 
        in: YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d
      'unicode':
        in: "¯\\_(ツ)_/¯"
      'not_printable1': 
        in: "\x7f"
      'not_printable2':
        in: "\xe2\x82\xac"
      'bismillah':
        in: "﷽"

But the empty testcase "" needs special attention from BAPCtool here:

https://github.com/RagnarGrootKoerkamp/BAPCtools/blob/9cc3a8511e74984328fdf0f6474bb34f4e212590/bin/generate.py#L768

1 The error message should say must may.

  1. The check should not run in invalid_input/*.

One problem with this (and which @mzuenni ’s suggestion avoids) is that problem authors can’t use the skeleton-provided generators.yaml if their editor explodes when confronted with extra-asciical characters like (ツ). I don’t know what to think about this. Even my VIM can show these things by now, so maybe we shouldn’t worry about that.

RagnarGrootKoerkamp commented 1 year ago

Personally, I find it the cleanest to do all this in the background without bloating the skeleton generators.yaml or data/invalid_inputs. One way to do that would be to merge your suggested invalid inputs into the user-provided generators.yaml file right after reading it (or right before checking the invalid_inputs).

(Then we could also disable that with some --no-default-invalid-inputs flag or so that can be set in the problem configuration.)

RagnarGrootKoerkamp commented 1 year ago

The commit above fixes the empty hardcoded data error, but we should also validate these differently during the generation process. Currently the output is simply that input validation failed and hence the testcase isn't generated.

RagnarGrootKoerkamp commented 1 year ago

Added support for data/invalid_inputs and data/invalid_outputs in the commit above. Also renamed the existing data/bad files in the skel and test/problems/identity.

Regarding auto-generation of invalid cases, I'm not really a fan of bloating the generators.yaml with it. I don't really like hardcoding them in data/invalid_inputs that much either, but at least that's more hidden.

I still think that ideally we simply check this in BAPCtools directly, but I don't really have a strong opinion and either way don't feel like working on this now. (So if one of you implements something decent I'll take it ;)

thorehusfeldt commented 1 year ago

I’m getting around to everybody else’s position here. The invalid_inputs in a generator should be those that are interesting and particular for that problem, like 1 0 in a problem that requires two space-separated nonzero integers. In contrast newline, random_printable, unicode, not_printable1, not_printable2 and (maybe) bismallah are just bloat and don’t belong in generators.yaml.

On the other hand, I construct plenty of Danish problems with æøåÆØÅ, and it should be easy for an author to change the default. To me, this sounds like a problem- or maybe contest-level config or dotfile.

thorehusfeldt commented 7 months ago

For reference, the sanity_checks are currently performed in validate.py:

https://github.com/RagnarGrootKoerkamp/BAPCtools/blob/5cc831a080a6b605c8cadca2862c0a224aee5fcd/bin/validate.py#L329

It’s mostly white-space checking. Is this where we should add bismilla etc?

mpsijm commented 7 months ago

Separate comment, something that struck my eye while reviewing #336: it looks like the invalid_inputs in generators.yaml also end up in data/invalid_inputs/ after running bt generate, is that intentional? I don't need the invalid test data for anything else than testing the validators, so if it's up to me, I'd keep them hidden in the temporary directory. What are others' opinions about this?

mzuenni commented 7 months ago

I think i prefer the way it is. If i see it in the generator.yaml i want to be able to see it in the directory?

thorehusfeldt commented 7 months ago

The existence of data/invalid_inputs and their semantics in fact part of the draft specification: https://www.kattis.com/problem-package-format/spec/2023-07-draft.html#invalid-input-files . (Whereas invalid_outputs and in particular invalid _answers is BAPC-specific.)

mzuenni commented 7 months ago

added with #336