UnicodeDecodeError problems due to non-parseable _JUNK_CASES

incaseoftrouble commented 3 years ago

When validating problems with the current develop branch, I get the following issue:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 137: invalid start byte, thrown in verifyproblem.py", line 1141, in __get_feedback: all_feedback.append(feedback.read(128*1024))

The problem is that the validator (based on validate.h) reports

TC 1: Judge says possible but author says !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~��������������������������������������������������������������������������������������������������������������������������������!

(output of cat judgemessage.txt).

This in turn is caused by the following entry in _JUNK_CASES: ('a binary file with byte values 0 up to 256', bytearray(x for x in range(256))). This creates an answer which contains non-unicode characters, which then fails to parse because when reading feedback, verifyproblem.py expects unicode answers.

Commenting out this line fixes the problem. I suggest one should either use every printable character / random unicode characters or make the parsing of judgemessage more robust (iterate character by character over the input and if it cannot be parsed replace with questionmark or similar?)

incaseoftrouble commented 10 months ago

Fixed by the referenced commit, somehow just didn't close

simonlindholm commented 10 months ago

I never got around to PR'ing that branch, so this isn't fixed yet AFAIK. (I think I was a little unsure about the non-ASCII warning.) Feel free to take the code and PR it if you want; I'm not sure I will find time to.

incaseoftrouble commented 10 months ago

Oh right, reading comprehension is a difficult skill, the PR is still open.

I might do that in the future, but time is a scarce resource for me, too :)

Kattis / problemtools

UnicodeDecodeError problems due to non-parseable _JUNK_CASES #201