Closed jsannemo closed 9 months ago
If we allow submissions that claim to be RTE to either WA or TLE, then what are we really testing? Would having a "rejected" directory, with the requirement being that it is not accepted (i.e. it can RTE, WC, TLE, ...) solve this need?
Yes, that's fine too.
I guess the reason why I'm fine with letting RTE get TLE and WA is that:
In the end I don't care very much, either solves my problem, so pick whatever you prefer :)
@RagnarGrootKoerkamp what does BAPC tools call this?
I'm using the Domjudge way to handle this, using @EXPECTED_RESULTS@:
, see here.
Are they placed in an arbitrary folder matching one of the expected verdicts?
+1 for rejected
anyway
Yes indeed, any of the matching folders works.
I think I have an assert somewhere that the folder it is in must be one of the listed verdicts.
submissions/rejected
sounds convenient indeed
In case anybody is still here, I have a draft implementation that allows you to specify this in `
time_limit_exceeded/recursive.py:
verdict:
- RTE
- TLE
or, if you like it terse:
time_limit_exceeded/recursive.py: ["RTE", "TLE"]
This can be specified down to individual testgroups, so you could also do, hypothetically:
time_limit_exceeded/recursive.py:
subgroups:
sample: AC
secret:
group1: AC
group2: AC
group3: ["TLE", "RTE"]
I have a BAPCtools fork that does this here: https://github.com/thorehusfeldt/BAPCtools
In, green the expected verdicts are shown. Testgroup data/secret/alcohols
got an unexpected WA
, so it’s red.
This is very preliminary, but it parses a yaml file with arbitrarily rich specifications per submission and per testgroup, and compares with the default grader (which I added to BAPCtools) for each internal node of the testdata tree.
I think this is the right way of doing it (or close enough), in particular for test groups. It is much superior to my own @EXPECTED_GRADES@ approach.
I prefer the @EXPECTED_RESULTS@
approach to encode this information inside the source code itself. This is meta-data that I think is intrinsically related to the submission (of course in the context of the problem), and by encoding it in the source, we ensure the meta-data is not lost e.g. when uploading the submission into a CCS or even when forwarding it from one to another CCS (e.g. when shadoing at the ICPC WFs).
I’ve played around with various ideas now.
For editing and curation, I find it much more pleasant to have a single-file overview.
(Use-case: add another testgroup, or merge two existing testgroups. I can do this very quickly in a single file, with no errors. I also with a single glance and check that all submissions get AC
on sample
, etc.) Also, the YAML could be syntax-checked.
On the other hand, when writing a new submission, or communicating the intent of a submission to others, the source-embedded approach makes more sense.
The semantics of “my” expected-grades proposal are orthogonal to this. An expectation could be defined (along with many others) in a common expecations.yaml
file
...
mixed/th.py: ["TLE", "RTE"]
time_limit_exceeded/recursive.py:
verdict: AC
score: 100
subgroups:
sample: AC
secret:
group1: AC
group2: AC
group3: ["TLE", "RTE"]
...
but it could just as well reside in the source code of time_limit_exceeded/recursive.py
:
#! /usr/bin/env python3
"""
@EXPECTATIONS_BEGIN@
verdict: AC
score: 100
subgroups:
sample: AC
secret:
group1: AC
group2: AC
group3: ["TLE", "RTE"]
@EXPECTATIONS_END@
"""
def solve(instance):
...
(I have no opinion about the convention for source code embedding syntax.)
Contests or traditions could allow both or either; a tool could warn if a submission supplies both (just it currently warns about inconsistency with expectations implied by the placement of the source file.)
Closing this issue because the actual issue is covered by both adding rejected
and the expectation framework. The former we have agreed to multiple times. I (somewhat superfluously) created a ticket for just that (#139) just so we don't forget to actually do it. The latter is WIP, currently discussed in #137.
The discussions in the thread are still interesting, but the actual issue is now closed.
I thought this had been the case at some point. It's something I was bitten by recently in Coding Cup, having a solution that was incorrect (in a crashy way), but which sometimes manifested as a WA and sometimes as a TLE.
TLE also allows WA since we only want to verify that /some/ test case times it out, but the above case doesn't seem to be clearly mappable in the spec right now?
@niemela @simonlindholm