Kattis / problemtools

Tools to manage problem packages using the Kattis problem package format.
MIT License
101 stars 70 forks source link

verifyproblem.py: check for invalid filename characters #144

Closed ghamerly closed 4 years ago

ghamerly commented 4 years ago

Fixes #143, though we may want to debate what characters should be whitelisted.

ghamerly commented 4 years ago

LGTM! Kattis's regex when uploading files looks like this, btw:

This check here is for test file names, not for source files, so I'm not sure what you're pointing out?

simonlindholm commented 4 years ago

Oops, I skimmed through this too quickly. Anyhow, I think it's reassuring that it uses basically the same whitelist.

ghamerly commented 4 years ago

Shouldn't we also check that test data group names match that regex?

Before I go too much further, let's discuss (perhaps not here?) what files we should be checking. The standard says "All file names for files included in the package must match the following regexp [a-zA-Z0-9][a-zA-Z0-9_.-]*[a-zA-Z0-9]" (https://clics.ecs.baylor.edu/index.php/Problem_format#General_Requirements)

I can easily add a check for all filenames in the problem package, but that's likely to be annoying. I.e. it will trigger on all files which have invalid characters (like ~ backup files, files that begin with a period (such as .timelimit) and such).

I've pushed such a solution (just as a starting point for discussion). It's simple. But like I said it's annoying.

austrin commented 4 years ago

Before I go too much further, let's discuss (perhaps not here?) what files we should be checking.

Right, sorry for the scope creep (I meant for this PR to stay within the test data but should have been more clear).

I agree with you that we should not check every single file appearing in the package. I think it would make sense to only check files which are assigned meaning by the standard. That's a somewhat complicated set though and we can add a few such checks to cover most things. We don't need to make an effort to find all such places for this PR but can for now happily settle for the data files (and, if you want, the closely related test case groups).

niemela commented 4 years ago

I agree with you that we should not check every single file appearing in the package.

Right. This line in the spec can be read either as a requirement (i.e. all files inside the problem package root directory must follow this naming) or as a definition (i.e. files that do not follow the naming are not included in the package, meaning they can be ignored). We definitely want to allow users to store whatever that does not conflict with thing that are assigned meaning in the package.

ghamerly commented 4 years ago

Okay, now we're checking all data files, according to a regex that is compiled once (in ProblemAspect) and can be used by any subclass of ProblemAspect. But it's currently only used on *.in/*.ans files within TestCase.

austrin commented 4 years ago

Okay, now we're checking all data files, according to a regex that is compiled once (in ProblemAspect) and can be used by any subclass. But it's currently only used on .in/.ans files.

OK that's fine by me if you for some reason don't want to apply it to group names.

Can you clean up the commit history of the PR and then we can merge?

ghamerly commented 4 years ago

OK that's fine by me if you for some reason don't want to apply it to group names.

Can you clarify what you mean? Do you mean directory names? I'm fine with that if that's what you mean.

austrin commented 4 years ago

Yes group names = directory names under data/

ghamerly commented 4 years ago

I added test data groups, rebased, and squashed.