Kattis / problem-package-format

Kattis problem package format specification
https://www.kattis.com/problem-package-format
9 stars 14 forks source link

Use YAML, not space-separated string of keywords, to specify problem type #202

Closed thorehusfeldt closed 2 days ago

thorehusfeldt commented 3 months ago

The new specification of type avoids the use of YAML as a specification language, which leads a convoluted definition, unnecessarily lenient values, arbitrary implementations, and difficulty of giving editor support.

(For instance, 2023-07-draft-2024-05-05 uses inconsistent symmetry in its specification of “is incompatible with”, and would allow type: "scoring scoring scoring". Is "interactive multi-pass" meant to be allowed?) While we’re at it, we manage to use pass with two different meanings_ (“the testcase passes, the problem uses two passes”) in the same value string – pass-fail multi-pass (!) That can’t be good.

Consider using YAML for this, since we’re using YAML for many other parts of the configuration files already. Here are some examples:

judgement: score
---
# empty, defaults to a single-pass, non-scoring problem
---
# same as above
judgement: verdict
submission: executable
passes: single
---
submission: answer # submit-answer
---
passes: interactive
---
judgement: score
submission: answer
--- 
judgement: score
passes: multiple
---
# FORBIDDEN
submission: answer
passes: multiple

The above can be specified neatly in CUE.

judgment: *"verdict" | "score"
submission: *"executable" | "answer"
if submission == "executable" {
    passes: *"single" | "multiple" | "interactive"
}

or less neatly in JSON (I can’t be bothered right now.) Thanks to the magic of schemastore.org The JSON spec would then automatically make every JSON-schema-aware editor in the world (such as VS Code or Emacs of VIM) check that people’s configuration files are correct (without typos and without contradictions). I cannot stress enough how useful that would be, in particular to new authors.

Caveat

The actual names for fields and values are unimportant. For instance, we can do

scoring: *false | true
executable: *true | false
if executable {
    passes: *"single" | "multiple" | "interactive"
}

Please don’t let your opinions about what the fields should be called get in your way of evaluating this proposal. It’s about using YAML instead of putting all of this into a string with some extra rules.

Also, I can see a case for rounds instead of passes, mainly to really make sure we avoid the class with pass-fail.

Say,

rounds: *1 | "multiple" | "interactive"

I consider these details secondary; they only become worth talking about after moving away from "pass-fail multi-pass pass-fail".)

Benefits

  1. Avoids various name clashes, if key/value names are picked carefully
  2. Supports problem author via schema-aware editor, avoiding ill-formed problem specifications
  3. Trivial to parse for the tool (which already parses yaml)
  4. Shorter and clearer to write down
  5. Automates (using cue vet or JSON), so we can automatically verify problem specifications.
Tagl commented 3 months ago

It’s about using YAML instead of putting all of this into a string with some extra rules.

Was it not changed from string to a YAML sequence / list recently? I definitely agree it should not be a string in some arbitrary format.

EDIT: Found it here #131

mzuenni commented 3 months ago

multi-pass is orthogonal to interactive i.e.interactive multi-pass should be valid

niemela commented 3 months ago

Use YAML, not space-separated string of keywords, to specify problem type

This is already the case. Clearly this needs to be made more clear :).

I guess it's the "String or" part of "String or sequence of strings" that's confusing? The intent here is that it must be a sequence of strings from among the allowed values (i.e. pass-fail, scoring, multi-pass, interactive, submit-answer), but that it's it's a sequence of length 1 we also allow to not be a sequence at all. I.e. these are valid values for type:

type: pass-fail
---
type: 
  - pass-fail
---
type: [pass-fail]
---
type:
  - pass-fail
  - interactive
---
type: [pass-fail, interactive]

The first three mean exactly the same thing, as does the last two.

niemela commented 3 months ago

For instance, 2023-07-draft-2024-05-05 uses inconsistent symmetry in its specification of “is incompatible with”,

Does it? I've been staring at it for a bit (because this is something I was explicitly worried about getting wrong), and it looks perfectly symmetrical to me? X is incompatible with Y exactly when Y is incompatible with X.

and would allow type: "scoring scoring scoring".

(You mean type: [scoring, scoring, scoring]).

Sure, I guess it doesn't explicitly say that you can't provide the same value more than once. It's not a huge issue, because if it was allowed it should clearly (?) mean exactly the same as type: scoring or type: [scoring], but I think it would make sense to disallow.

Is "interactive multi-pass" meant to be allowed?)

Yes, definitely.

niemela commented 3 months ago

Also, I can see a case for rounds instead of passes, mainly to really make sure we avoid the class with pass-fail.

Say,

rounds: *1 | "multiple" | "interactive"

The interactive that we already have does not make sense as a value for rounds or passes. Are you intending some other meaning of "interactive" here? I could imagine a difference between a constant number of passes as a opposed to a variable number of passes, and maybe the latter could be called "interactive"? I don't think that distinction is important, and if we want it I don't think "interactive" is a good name for that concept.

I consider these details secondary; they only become worth talking about after moving away from "pass-fail multi-pass pass-fail".

What does this mean? Type is not (and was never) a space separated string, and you were never intended to provide multiple copies of the same value, so I think we have moved away from this (or we were never there)?

thorehusfeldt commented 3 months ago

(You mean type: [scoring, scoring, scoring]).

Well, apparently, I mean ["scoring", "scoring", "scoring"] and "scoring scoring scoring" was never an intended value in the first place. Thank you for this clarification; what an unfortunate misunderstanding.

So what is meant in the current specification 2023-07-draft-early-may is that there are some fixed strings, and the value of type is one of those values or a list of them. So the syntax is

#type_indicator: "pass-fail" | "scoring" | "multi-pass" | "interactive" | "submit-answer"
type?: #type_indicator | [...#type_indicator]

Or maybe even (with a default)

#type_indicator: "pass-fail" | "scoring" | "multi-pass" | "interactive" | "submit-answer"
type: *"pass-fail" | #type_indicator | [...#type_indicator]

This allows:

type: pass-fail
---
type: ["pass-fail", "interactive"]
---
type:
 - scoring
 - multipass

Moreover, when type is a list

  1. there shall be no repetitions, and
  2. there are some list values that may not both appear.

(I can write these down precisely later; pressed for time right now. But I maintain the position that we should instead specify these constraints using YAML, instead of providing them as “a list with some extra rules”, which we demonstrably fail to communicate clearly to each other.)