Mixing of .in and .interaction samples

Matistjati commented 4 weeks ago

Would it be reasonable that a problem being interactive means you should only be allowed to have .interaction samples (no .in)?

If not, should you ever be allowed to mix .in and .interaction? has this ever occured?

niemela commented 4 weeks ago

@austrin @jsannemo @simonlindholm @ghamerly Thoughts?

niemela commented 4 weeks ago

If not, should you ever be allowed to mix .in and .interaction? has this ever occured?

To clarify, the question is whether there can be a problem where some samples have .interaction and other do not and only have .in. Not if a single sample can have a .interaction and a .in, which is clearly true.

This is in fact specified in the "Interactive Problems" section. But it took way longer to find and figure this out. The relevant sections could clearly take some clarifications.

(Or, we might be too tired?)

RagnarGrootKoerkamp commented 3 weeks ago

Some more comments. (See also #265)

Which files exist:

data/sample/1.in
data/sample/1.ans
problem_statement/sample/1.in
problem_statement/sample/1.ans (or call this out?)
problem_statement/sample/1.interaction

For brevity I'll refer to these as data/1.* and statement/1.*.

statement/* files may or may not correspond directly to files with the same basename in data/*.

Restriction:

data/1.{in,ans} must come in pairs.
If data/1.{in,ans} does not exist, statement/1.{in,ans} must come in pairs.
statement/1.interaction and statement/1.in are only allowed for interactive problems.
statement/1.ans is always allowed.
Interactive problems are required to provide a statement/1.{in,ans} pair, and may additionally have a statement/1.interaction. It is not allowed to have data/*.in testcases that do not correspond to a statement/* testcase.

What is shown in the statement

For each sample test case (defined by either a data/1.in, statement/1.in, or statement/1.interaction)

If there is a statement/1.interaction, show that.
Otherwise, show a .in/.ans pair, where the default in data/sample/1.{in,ans} can optionally be overwritten by statement/*.{in,ans}.

We should probably restrict this to require consistency, so that either:

either all or no testcases have a statement/1.in
either all or no testcases have a statement/1.ans
either all or no testcases have a statement/1.interaction

Custom output validation problems may override the default data/1.ans, but do not have to. The data/1.ans is used as input to the output validator and may or may not be a valid answer in itself. If a sample/1.ans is provided, tooling can verify that it is a valid answer indeed. This can be used to e.g. have a high-precision data/1.ans, but a lower precision statement/1.ans to show to teams.

Interactive problems are allowed (but not required) to have a statement/1.interaction for every test case. Instead of that, they are also allowed to have a statement/1.{in,ans} pair that can be shown instead for 'fake interactive' problems, in particular for problems where (random) input data is generated on the fly and is passed to the team submission as-if it were a classic input-output problem.

What is available to contestants as download

default & custom validation: give data/1.in and statement/1.ans if present, otherwise data/1.ans interactive: give statement/1.{in,ans}.

When statement/1.interaction is not present, statement/1.{in,ans} are the generated input and corresponding answer.
When statement/1.interaction files are also present, these statement/1.{in,ans} are the input files to the testing tool.

TODO: for interactive problems with statement/1.interaction files, do we require or allow statement/1.{in,ans} files? If we require them, that means every shown interaction must actually have a corresponding testcase for download, which I think is good. If we allow them, then two things could happen:

the corresponding data/1.{in,ans}

Fake interactive problems / generated input problems

Just to repeat: It's possible to have problems with on-the-fly generated input, by specifying them as an interactive type problem, but then not providing statement/1.interaction files. Instead, statement/1.{in,ans} can be provided for the generated .in and corresponding .ans, while data/1.{in,ans} are the instructions to the interactor, which takes the role of both an input generator and output validator.

interaction with `run_samples: false`?

We added run_samples: false to avoid running on samples, e.g. in cases where they do not follow the spec because they use n=10 instead of n=1000 (which would be guaranteed for secret data).

Instead of providing data/sample/*.{in,ans} files, this could now be implemented by leaving data/sample empty and only providing these files in problem_statement/sample/*.{in,ans}. That may be preferred, and then we drop run_samples?

Summary

Files in data/sample/* are the testcases that are judged as samples.
Files in problem_statement/sample/* control/override what is shown in the statement and available for download.

Question: I think it makes sense to require that data/sample/* and problem_statement/sample/* either contain the same set of cases, or else one of them must be empty, just to ensure consistency.

simonlindholm commented 3 weeks ago

Fake interactive problems / generated input problems

Do we have examples of this actually being used in practice? There's never a case where you actually strictly need this, right? I can say I've never felt the need, and it feels like it just risks providing worse UX (e.g. having judges show confusing UI and not being able to show failing test case input/output in the same way as for other non-interactive problems) and adding a lot of bug potential around stdin/stdout fd closures, EOF checking and other termination behavior.

To clarify, the question is whether there can be a problem where some samples have .interaction and other do not and only have .in.

FWIW, https://github.com/zehnsechs/egoi-2024-testdata/tree/main/day1/gardendecorations/data/sample had this, because we wanted to have a sample test case that would be run by Kattis while also splitting the .interaction file into three because it was a multi-run problem. I'm not sure how we're envisioning samples to work for multi-run problems.

niemela commented 3 weeks ago

I'm not sure how we're envisioning samples to work for multi-run problems.

multi-pass will use .interaction files (https://www.kattis.com/problem-package-format/spec/2023-07-draft.html#multi-pass-validation).

(Was that the answer you were looking for?)

simonlindholm commented 3 weeks ago

Ah, thanks, that makes sense.

RagnarGrootKoerkamp commented 3 weeks ago

Do we have examples of this actually being used in practice?

Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.

RagnarGrootKoerkamp commented 3 weeks ago

Reopening, since there are still some unresolved discussions in #291.

I think one thing that also isn't really specified is whether for custom output validation and interactive problems, we require that for each test case in data/sample/statement, the same set of files is present, rather than an independent set of valid files.

simonlindholm commented 3 weeks ago

Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.

Interesting. Is the problem package available for any of them? I do feel like that kind of setup is inadvisable, and it's better to keep the test data static while still giving a guarantee that it was generated at random.

RagnarGrootKoerkamp commented 3 weeks ago

See problem L here: https://2022.bapc.eu/bapc/problems.pdf (You can download the sources via the homepage.)

With static random testdata, there is always the probability that some specific solution hits an annoying edge case, which is avoided by regenerating it each time.

Kattis / problem-package-format