Kattis / problemtools

Tools to manage problem packages using the Kattis problem package format.
MIT License
101 stars 70 forks source link

Don't propagate JE on empty test groups #176

Open tyilo opened 4 years ago

tyilo commented 4 years ago

Assuming we have a problem with two test groups group1 and group2 then running verifyproblem foo -d group1/ always results in a JE verdict, because the pattern doesn't match any test cases in group2.

This fixes it, by returning the ? verdict for an empty group.

I don't know if this is the best solution.

thorehusfeldt commented 3 years ago

I agree that the current behaviour is suboptimal and would like to see it changed. verifypoblem’s ability to selectively remove test groups is very useful during problem development, and the JE verdict is highly misleasing.

However, I’d rather see this changed at the level of the default grader, here:

https://github.com/Kattis/problemtools/blob/307bcbf2c90e75db6513ee6e4daa41590d988906/support/default_grader/default_grader#L64

The specification at https://www.problemarchive.org/wiki/index.php/Problem_Format#Graders is silent about how to handle this—it boils down to whether “no errors found because no tests were run” should mean AC. It’s aesthetic choice as much as a moral one, to quote Bill Haydon, akin to whether the empty product is the multiplicative unity.

Suggestion

The cleanest solution would be to introduce the verdict ETG for empty test group. Currently, there is already a cornucopia of Verdict/Grade/Judgement in problemtools. At the grader level, these currently include:

https://github.com/Kattis/problemtools/blob/307bcbf2c90e75db6513ee6e4daa41590d988906/support/default_grader/default_grader#L7

but at other levels in the infrastructure, other verdicts are used. For instance, I don’t think PE and OLE ever make it through verifyproblem.

It would then be up to the default grader to decide how ETG propagates (and this can be described in the documentation). In particular, making ETG into an explicit verdict allows authors to change the grader if they have use cases with different preferences.

My own preference would be that “passing the empty test” gives AC yet that verifyproblem emits a friendly warning (such as “no tests run”, or “there were empty test groups”) . Alternatively I can also see the new AC-ish verdict ANT (accepted with no tests), mimicking APE from https://clics.ecs.baylor.edu/index.php/Contest_API#Judgement_Types , but it seems heavy-handed to burden the top-level family of verdicts with that.

But my experience with this is limited, and there are many other problem construction traditions that I haven’t thought through at all.

thorehusfeldt commented 3 years ago

Concrete suggestion:

The Probem format does not specify the result of aggregating with mode min an empty set of scores. (Such empty sets of scores typically arise during problem development, when empty test groups arise from filtering.)

verifyproblem returns the code AC and the score specified by accept_score for this group, yet will emit the warning Em pty test groups: followed by a list of test groups.