Closed thorehusfeldt closed 7 months ago
It is probably easier to add a generator which produces these
#!/usr/bin/env python3
import sys
strings = {
'empty': b'',
'newline' : b'\n',
'random_printable' : b'YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d\n',
'unicode' : bytes('¯\\_(ツ)_/¯\n', 'utf-8'),
'not_printable1' : b'\x7f\n',
'not_printable2' : b'\xe2\x82\xac\n'
}
sys.stdout.buffer.write(strings[sys.argv[1]])
That would work, but then we’re back at a mandatory stdout.py
-like generator that practically always has to be there, and a lot of redundancy. This seems like a step backwards.
I’d prefer to either have some good, explicit invalid_inputs
in the skeleton-provided generator or move this to the behind-the-curtains sanity check.
I think something like the following works for me:
data:
bad:
data:
'newline':
in: "\n"
'random_printable':
in: YVRtr&*teTsRjs8ZC2%kN*T63V@jJq!d
'unicode':
in: "¯\\_(ツ)_/¯"
'not_printable1':
in: "\x7f"
'not_printable2':
in: "\xe2\x82\xac"
'bismillah':
in: "﷽"
But the empty testcase ""
needs special attention from BAPCtool here:
1 The error message should say must may.
invalid_input/*
.—
One problem with this (and which @mzuenni ’s suggestion avoids) is that problem authors can’t use the skeleton-provided generators.yaml
if their editor explodes when confronted with extra-asciical characters like (ツ). I don’t know what to think about this. Even my VIM can show these things by now, so maybe we shouldn’t worry about that.
Personally, I find it the cleanest to do all this in the background without bloating the skeleton generators.yaml
or data/invalid_inputs
. One way to do that would be to merge your suggested invalid inputs into the user-provided generators.yaml
file right after reading it (or right before checking the invalid_inputs
).
(Then we could also disable that with some --no-default-invalid-inputs
flag or so that can be set in the problem configuration.)
The commit above fixes the empty hardcoded data error, but we should also validate these differently during the generation process. Currently the output is simply that input validation failed and hence the testcase isn't generated.
Added support for data/invalid_inputs
and data/invalid_outputs
in the commit above.
Also renamed the existing data/bad
files in the skel
and test/problems/identity
.
Regarding auto-generation of invalid cases, I'm not really a fan of bloating the generators.yaml
with it. I don't really like hardcoding them in data/invalid_inputs
that much either, but at least that's more hidden.
I still think that ideally we simply check this in BAPCtools directly, but I don't really have a strong opinion and either way don't feel like working on this now. (So if one of you implements something decent I'll take it ;)
I’m getting around to everybody else’s position here. The invalid_inputs
in a generator should be those that are interesting and particular for that problem, like 1 0
in a problem that requires two space-separated nonzero integers. In contrast newline
, random_printable
, unicode
, not_printable1
, not_printable2
and (maybe) bismallah
are just bloat and don’t belong in generators.yaml
.
On the other hand, I construct plenty of Danish problems with æøåÆØÅ
, and it should be easy for an author to change the default. To me, this sounds like a problem- or maybe contest-level config
or dotfile
.
For reference, the sanity_checks
are currently performed in validate.py
:
It’s mostly white-space checking. Is this where we should add bismilla
etc?
Separate comment, something that struck my eye while reviewing #336: it looks like the invalid_input
s in generators.yaml
also end up in data/invalid_inputs/
after running bt generate
, is that intentional? I don't need the invalid test data for anything else than testing the validators, so if it's up to me, I'd keep them hidden in the temporary directory. What are others' opinions about this?
I think i prefer the way it is. If i see it in the generator.yaml
i want to be able to see it in the directory?
The existence of data/invalid_inputs
and their semantics in fact part of the draft specification: https://www.kattis.com/problem-package-format/spec/2023-07-draft.html#invalid-input-files . (Whereas invalid_outputs
and in particular invalid _answers
is BAPC-specific.)
added with #336
TODOs for invalid inputs:
data/invalid_inputs
, continue supportingdata/bad
but with a warningbt new_problem
indata/bad
to the generator. This requires changing the standard skeleton, and is not trivial:What is worrisome is that parts of the toolchain (in particular, python’s YAML parser) may choke on the non-printable unicode shenanigans before it ever gets to write the
.in
file.