Kattis / problemtools

Tools to manage problem packages using the Kattis problem package format.
MIT License
101 stars 70 forks source link

Disentangle score, score, and score.txt for scoring problem with scoring and nonscoring validators #239

Open thorehusfeldt opened 9 months ago

thorehusfeldt commented 9 months ago

“Currently” (i.e., 2 Oct 2023), the 2023-07 draft says https://www.kattis.com/problem-package-format/spec/2023-07-draft.html

Grading score The score assigned to an accepted input file in the group. If a scoring output validator is used, this score is multiplied by the score from the validator.

(The current version of the specification does not define what a “scoring output validator” is, but it refers to the output validator’s behaviour of producing score.txt sketched under “Reporting Additional Feedback”.)

I think it should say:

Scoring score_multiplier The score assigned to an accepted test case in this group is the product of score_multiplier and the value written in score.txt by the output validator.

The changes are as follows:

  1. test cases, not input files are scored
  2. this is about scoring, not grading. “Grading” shouldn’t exist.
  3. my proposed key-name score_multiplier is different than the old key-name score. This is becuase the concept of (score-in-testdata.yaml) and (score.txt-written-by-output-validator) are different, yet both called score, and neither is actually the score. (To be precise, let a be testdata.yaml.score, let b be the output validator’s score.txt. Then the “score” of this testcase is neither a nor b, but instead their product a*b. That is downright misleading.) An even better solution would be to have the output validator write to score_fraction.txt or something like that. I’m happy to propose more dramatic changes, like testdata.yaml.weight and completion.txt or actually being explicit about the two different modes of scoring output validators. But at least we shouldn’t have two different concepts, both called score, whose product is also called score.
  4. Most importantly, in my proposal, the value of testdata.yaml.score_multiplier has unique semantics: it is always multiplied with score.txt; there’s just a useful default of a missing score.txt containing exactly the number 1.0.

I would be even happier with restricting score.txt to the range [0, 1] (as well as renaming it); but maybe that’s too restrictive for some problem development traditions .