Kattis / problem-package-format

Kattis problem package format specification
https://www.kattis.com/problem-package-format
9 stars 14 forks source link

Allowing run_time_error solutions to either WA, or TLE #17

Closed jsannemo closed 9 months ago

jsannemo commented 2 years ago

I thought this had been the case at some point. It's something I was bitten by recently in Coding Cup, having a solution that was incorrect (in a crashy way), but which sometimes manifested as a WA and sometimes as a TLE.

TLE also allows WA since we only want to verify that /some/ test case times it out, but the above case doesn't seem to be clearly mappable in the spec right now?

@niemela @simonlindholm

niemela commented 2 years ago

If we allow submissions that claim to be RTE to either WA or TLE, then what are we really testing? Would having a "rejected" directory, with the requirement being that it is not accepted (i.e. it can RTE, WC, TLE, ...) solve this need?

jsannemo commented 2 years ago

Yes, that's fine too.

I guess the reason why I'm fine with letting RTE get TLE and WA is that:

  1. In C++, an RTE can often result in either
  2. I can't ever remembered writing a solution where I want to test for RTE, unlike WA and TLE - solutions only get placed there because the happen to crash on a case (except they sometimes don't /always/ do...)

In the end I don't care very much, either solves my problem, so pick whatever you prefer :)

jsannemo commented 2 years ago

@RagnarGrootKoerkamp what does BAPC tools call this?

RagnarGrootKoerkamp commented 2 years ago

I'm using the Domjudge way to handle this, using @EXPECTED_RESULTS@:, see here.

jsannemo commented 2 years ago

Are they placed in an arbitrary folder matching one of the expected verdicts?

+1 for rejected anyway

RagnarGrootKoerkamp commented 2 years ago

Yes indeed, any of the matching folders works.

I think I have an assert somewhere that the folder it is in must be one of the listed verdicts.

submissions/rejected sounds convenient indeed

thorehusfeldt commented 1 year ago

In case anybody is still here, I have a draft implementation that allows you to specify this in `/submissions/expected_grades.yaml' like this:

time_limit_exceeded/recursive.py:
  verdict:
    - RTE
    - TLE

or, if you like it terse:

time_limit_exceeded/recursive.py: ["RTE", "TLE"]

This can be specified down to individual testgroups, so you could also do, hypothetically:

time_limit_exceeded/recursive.py:
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  

I have a BAPCtools fork that does this here: https://github.com/thorehusfeldt/BAPCtools

In, green the expected verdicts are shown. Testgroup data/secret/alcohols got an unexpected WA, so it’s red. image

This is very preliminary, but it parses a yaml file with arbitrarily rich specifications per submission and per testgroup, and compares with the default grader (which I added to BAPCtools) for each internal node of the testdata tree.

I think this is the right way of doing it (or close enough), in particular for test groups. It is much superior to my own @EXPECTED_GRADES@ approach.

eldering commented 1 year ago

I prefer the @EXPECTED_RESULTS@ approach to encode this information inside the source code itself. This is meta-data that I think is intrinsically related to the submission (of course in the context of the problem), and by encoding it in the source, we ensure the meta-data is not lost e.g. when uploading the submission into a CCS or even when forwarding it from one to another CCS (e.g. when shadoing at the ICPC WFs).

thorehusfeldt commented 1 year ago

I’ve played around with various ideas now.

For editing and curation, I find it much more pleasant to have a single-file overview.

(Use-case: add another testgroup, or merge two existing testgroups. I can do this very quickly in a single file, with no errors. I also with a single glance and check that all submissions get AC on sample, etc.) Also, the YAML could be syntax-checked.

On the other hand, when writing a new submission, or communicating the intent of a submission to others, the source-embedded approach makes more sense.

The semantics of “my” expected-grades proposal are orthogonal to this. An expectation could be defined (along with many others) in a common expecations.yaml file

...
mixed/th.py: ["TLE", "RTE"]
time_limit_exceeded/recursive.py:
  verdict: AC
  score: 100
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  
...

but it could just as well reside in the source code of time_limit_exceeded/recursive.py:

#! /usr/bin/env python3
"""
@EXPECTATIONS_BEGIN@
  verdict: AC
  score: 100
  subgroups:
    sample: AC
    secret:
      group1: AC
      group2: AC
      group3: ["TLE", "RTE"]  
@EXPECTATIONS_END@
"""

def solve(instance):
 ...

(I have no opinion about the convention for source code embedding syntax.)

Contests or traditions could allow both or either; a tool could warn if a submission supplies both (just it currently warns about inconsistency with expectations implied by the placement of the source file.)

niemela commented 9 months ago

Closing this issue because the actual issue is covered by both adding rejected and the expectation framework. The former we have agreed to multiple times. I (somewhat superfluously) created a ticket for just that (#139) just so we don't forget to actually do it. The latter is WIP, currently discussed in #137.

The discussions in the thread are still interesting, but the actual issue is now closed.