Kattis / problem-package-format

Kattis problem package format specification
https://www.kattis.com/problem-package-format
9 stars 14 forks source link

Support for static validator(s) #130

Closed niemela closed 2 weeks ago

niemela commented 11 months ago

A static validator is a programs that validates a submission independent of test data. I.e. it would be a program that gets the source code submitted and nothing else and either accepts or rejects the submission.

This could be used in a teaching context, either to require a code standard, or for (approximating) something like "Solve this using two for loops". It could also be used for things like "You're not allowed to use BigInteger".

I don't see how you could ever make this language agnostic? I.e. it would always be used together with languages.

I don't think this would ever be useful in a contest?

(This could be used for #46 as well, but I think that would be a very bad way to do that)

@thorehusfeldt @ghamerly @Tagl Thoughts?

Tagl commented 11 months ago

Yes! I think this is a necessary addition to support formatting validators (linters), which is desired by some teachers. I have tried to do checks like this (no bin/hex allowed) for python programs using include and there's always some edge case that pops up so I would very much like this.

You would need a static validator per supported language, agreed. We also need to think about how this interacts with scoring, since you may want to give different scores depending on the result of static validations. Does this introduce a new verdict? We would need a key in testdata.yaml like we have for other validators.

I don't think this would ever be useful in a contest?

Depends on the contest, right? ICPC or IOI? Probably never. Something that is a mix between a hackathon and an algorithmic contest? Sure, why not? I can definitely foresee a contest where at least one problem has some sort of static validation.

ghamerly commented 11 months ago

I love this idea for teaching. However, there are many things that that could be statically checked, and not all make sense to be associated with the problem. For example, using (or not using) a particular algorithmic approach feels appropriate. However, linting is something I would love to support in a teaching context, but does not feel appropriate for associating with a problem to me.

niemela commented 11 months ago

We also need to think about how this interacts with scoring, since you may want to give different scores depending on the result of static validations.

My thought was that static validation would be binary, and failing it would terminate the submission. It's basically an additional "compilation" step. Which also brings up the question of, should it happen before or after compilation. To me it would make the most sense if it was after compilation.

If we want to allow static validation to fail and still continue with judgement and/or allow static validation to return some kind of score that gets baked into the scoring than we need something more. I guess some kind of node in the test data graph infrastructure would make sense?

Does this introduce a new verdict?

From the point of view of the problem format, not really. Compare to how there is no mention of compile error in the format.

I think it would make sense for most systems that implement this to introduce a new verdict for this though.

Depends on the contest, right? ICPC or IOI? Probably never. Something that is a mix between a hackathon and an algorithmic contest? Sure, why not? I can definitely foresee a contest where at least one problem has some sort of static validation.

Sure, I agree.

For example, using (or not using) a particular algorithmic approach feels appropriate. However, linting is something I would love to support in a teaching context, but does not feel appropriate for associating with a problem to me.

Agreed, typical linting is probably done better as a property of the language. (cf. how Kattis now has a language C++ with memory check. That is not a static check, but it's kinda the same principle).

Tagl commented 11 months ago

However, linting is something I would love to support in a teaching context, but does not feel appropriate for associating with a problem to me.

I disagree, in educational tasks I may want to enforce specific code style later on but I would not want to enforce that on beginners where I am more concerned with them being able to put their thoughts into code. However, if we do allow static validation in general then the author can do this by adding the static validator that they want in their problem. If a judge system supports a version of some language that also does this then that seems like a completely independent check from the problem, but you could have both. I much prefer the problem requesting the linting than the judge system applying it for all code in say Python, but that may be because I want to apply scoring to it too. Also note that linting is mostly synonymous with static analysis, it's not only for style guides.

pehrsoderman commented 4 months ago

Linters are typically not part of a judge environment. Do you intend the linter to be shipped with the problem? That's a lot of additional building and complexity in the problem package... Or is the judge system responsible for providing some (so far) unspecified set of linters?

Tagl commented 4 months ago

I should not have used that word as it is most commonly used for checking the formatting of code or proper use of typing.

It could be checking formatting of code, it could be any other kind of static analysis. It really depends on the problem and what checks are desired. For checking general formatting of code I agree that in most cases just having it a part of the language on the judge is more sensible.

Yes I generally intend to ship the static validation program in the problem package. The usual case here would be checking something specific for a specific language like "is there recursion" or "is the submission calling a function it's not supposed to call". These static validators will be quite short in code. There is however nothing stopping you from having a larger program there for an individual problem, just like with output validators. We have done this in the past where an output validator included a linear algebra library because it made development a lot easier.

Yes, it would work to know that the judge you use has a specific program available but it's not nice that it's non-portable. Perhaps we should specify some minimal set of programs that judges which want to fully support static validation must support. I'm unsure about which programs should make the cut in that case.

Tagl commented 4 months ago

Note for output validators we already restricted the required set of tools to basically "A c++ compiler" and "A python 3 interpreter" in order to simplify the requirements. I think static validators should have the same requirements so we do not expect other tools to be installed, they have to be shipped with the problem.

niemela commented 4 months ago

@Tagl Let's see if we can move forward with this?

Based on the discussions so far, there are 3 main options for static validation:

  1. One additional pass-fail test directly after compilation. If this test fails, the submission fails. It can be used for all types of problems (and specifically for both pass-fail and scoring), but it can only affect the score in the sense that it can reject the submission completely (meaning effectively a score of 0).
  2. One additional test directly after compilation that can produce some kind of score. If this test fails, the submission fails, but if it passes it can also produce some score. Similar to how output validators produce score.txt only if they pass. This score is then somehow combined with the score given from the existing test data (groups), probably by adding (?), but I guess it could also be some other aggregation method such as average or max.
  3. Allowing static validation "nodes" to be inserted at arbitrary places in the test data group graph. Each such node can produce a vedict (and optionally a score) just like any normal test data node, and the score is aggregated in the normal way.

Pro of 1 is that is simple. Con is that you can't do any kind of static validation based scoring.

Pro of 2 is that it's still pretty simple, and you can do most kinds of scoring. Cons are that you can't have separate scores for static validation that depends on which test groups passes. E.g. if we have a problem with 3 main groups, we might want to have separate static validation per group where the static validation score is only awarded if the group is solved. Also, that the presentation is a bit limited. Even in cases where you can do per group static scores (e.g. if you test different things per group but you want to award the static validation score regardless off whether those groups are accepted) presenting these scores in the UI in a clear way is difficult (impossible?), since you can pretty much just do a single "blob" of static score.

Pro of 3 is that it supports "everything" that we would reasonably want to do. Con is that it's the most complicated solution. In fact, we have not been able to figure out what that solution is (yet). I'm sure it could be done, but it's certainly not trivial or obvious.

@Tagl I think you have mainly been arguing for us moving towards the "higher" (i.e. more general) options. How far do you think we "need" to go, or alternatively, how far do you think it's worth to go?

As I see it theses are the open questions (after we choose an option):

Tagl commented 2 months ago

I think 2 is the bare minimum but I would prefer 3. Your point about providing data for UI is another good reason to prefer something like option 3. I do believe that in majority of cases static validation would be a single node in the test data group graph at the top level.

Tagl commented 2 weeks ago

We're aiming for option 3 after discussion with @niemela @evouga @eldering @JoelNiemela

In each group something like:

static_validation:
    score: 20
    flags: --strict --no-warnings --no-recursion

That means the max score from static validation is 20. Static validator can output score.txt to change this value, like an output validator. It adds a test as a node within the group. Score should be given only for scoring problems. In conjunction with require-pass this allows for a lot of flexibility with static validation.

In most cases this would be used with pass-fail or min aggregation at top level.

RagnarGrootKoerkamp commented 2 weeks ago

Does this mean there is a single /static_validator program at the root? Just like there is a single /output_validator? And if needed it can be reused with different flags.

(I guess multiple static validators could be a thing? But then the name of the program should be passed under static_validation as well.)

Tagl commented 2 weeks ago

Yes, just one static_validator