Open Matistjati opened 4 weeks ago
@austrin @jsannemo @simonlindholm @ghamerly Thoughts?
If not, should you ever be allowed to mix .in and .interaction? has this ever occured?
To clarify, the question is whether there can be a problem where some samples have .interaction
and other do not and only have .in
. Not if a single sample can have a .interaction
and a .in
, which is clearly true.
This is in fact specified in the "Interactive Problems" section. But it took way longer to find and figure this out. The relevant sections could clearly take some clarifications.
(Or, we might be too tired?)
Some more comments. (See also #265)
data/sample/1.in
data/sample/1.ans
problem_statement/sample/1.in
problem_statement/sample/1.ans
(or call this out
?)problem_statement/sample/1.interaction
For brevity I'll refer to these as data/1.*
and statement/1.*
.
statement/*
files may or may not correspond directly to files with the same basename in data/*
.
Restriction:
data/1.{in,ans}
must come in pairs.data/1.{in,ans}
does not exist, statement/1.{in,ans}
must come in pairs.statement/1.interaction
and statement/1.in
are only allowed for interactive problems.statement/1.ans
is always allowed.statement/1.{in,ans}
pair, and may additionally have a statement/1.interaction
. It is not allowed to have data/*.in
testcases that do not correspond to a statement/*
testcase.For each sample test case (defined by either a data/1.in
, statement/1.in
, or statement/1.interaction
)
statement/1.interaction
, show that..in
/.ans
pair, where the default in data/sample/1.{in,ans}
can optionally be overwritten by statement/*.{in,ans}
.We should probably restrict this to require consistency, so that either:
statement/1.in
statement/1.ans
statement/1.interaction
Custom output validation problems may override the default data/1.ans
, but do not have to. The data/1.ans
is used as input to the output validator and may or may not be a valid answer in itself. If a sample/1.ans
is provided, tooling can verify that it is a valid answer indeed. This can be used to e.g. have a high-precision data/1.ans
, but a lower precision statement/1.ans
to show to teams.
Interactive problems are allowed (but not required) to have a statement/1.interaction
for every test case. Instead of that, they are also allowed to have a statement/1.{in,ans}
pair that can be shown instead for 'fake interactive' problems, in particular for problems where (random) input data is generated on the fly and is passed to the team submission as-if it were a classic input-output problem.
default & custom validation: give data/1.in
and statement/1.ans
if present, otherwise data/1.ans
interactive: give statement/1.{in,ans}
.
statement/1.interaction
is not present, statement/1.{in,ans}
are the generated input and corresponding answer.statement/1.interaction
files are also present, these statement/1.{in,ans}
are the input files to the testing tool.TODO: for interactive problems with statement/1.interaction
files, do we require or allow statement/1.{in,ans}
files?
If we require them, that means every shown interaction must actually have a corresponding testcase for download, which I think is good. If we allow them, then two things could happen:
data/1.{in,ans}
Just to repeat: It's possible to have problems with on-the-fly generated input, by specifying them as an interactive type problem, but then not providing statement/1.interaction
files. Instead, statement/1.{in,ans}
can be provided for the generated .in
and corresponding .ans
, while data/1.{in,ans}
are the instructions to the interactor, which takes the role of both an input generator and output validator.
run_samples: false
?We added run_samples: false
to avoid running on samples, e.g. in cases where they do not follow the spec because they use n=10
instead of n=1000
(which would be guaranteed for secret data).
Instead of providing data/sample/*.{in,ans}
files, this could now be implemented by leaving data/sample
empty and only providing these files in problem_statement/sample/*.{in,ans}
. That may be preferred, and then we drop run_samples
?
data/sample/*
are the testcases that are judged as samples.problem_statement/sample/*
control/override what is shown in the statement and available for download.Question: I think it makes sense to require that data/sample/*
and problem_statement/sample/*
either contain the same set of cases, or else one of them must be empty, just to ensure consistency.
Fake interactive problems / generated input problems
Do we have examples of this actually being used in practice? There's never a case where you actually strictly need this, right? I can say I've never felt the need, and it feels like it just risks providing worse UX (e.g. having judges show confusing UI and not being able to show failing test case input/output in the same way as for other non-interactive problems) and adding a lot of bug potential around stdin/stdout fd closures, EOF checking and other termination behavior.
To clarify, the question is whether there can be a problem where some samples have .interaction and other do not and only have .in.
FWIW, https://github.com/zehnsechs/egoi-2024-testdata/tree/main/day1/gardendecorations/data/sample had this, because we wanted to have a sample test case that would be run by Kattis while also splitting the .interaction file into three because it was a multi-run problem. I'm not sure how we're envisioning samples to work for multi-run problems.
I'm not sure how we're envisioning samples to work for multi-run problems.
multi-pass will use .interaction
files (https://www.kattis.com/problem-package-format/spec/2023-07-draft.html#multi-pass-validation).
(Was that the answer you were looking for?)
Ah, thanks, that makes sense.
Do we have examples of this actually being used in practice?
Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.
Reopening, since there are still some unresolved discussions in #291.
I think one thing that also isn't really specified is whether for custom output validation and interactive problems, we require that for each test case in data/sample/statement
, the same set of files is present, rather than an independent set of valid files.
Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.
Interesting. Is the problem package available for any of them? I do feel like that kind of setup is inadvisable, and it's better to keep the test data static while still giving a guarantee that it was generated at random.
See problem L
here: https://2022.bapc.eu/bapc/problems.pdf
(You can download the sources via the homepage.)
With static random testdata, there is always the probability that some specific solution hits an annoying edge case, which is avoided by regenerating it each time.
Would it be reasonable that a problem being interactive means you should only be allowed to have .interaction samples (no .in)?
If not, should you ever be allowed to mix .in and .interaction? has this ever occured?