Ryan6578 / Codewords

A LUA script for Codewords on Tabletop Simulator for Steam.
https://ryan6578.com
GNU General Public License v3.0
29 stars 16 forks source link

adding unicode friendly clue parsing #92

Open Canonelis opened 3 years ago

Canonelis commented 3 years ago

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$ where "\a" represents all legal clue letters. I needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.

Canonelis commented 3 years ago

So this allows all letters allowed by %a in lua scripting, but also allows any unicode characters above 0x0370 except for some whitespace characters and dashes. Very versatile and still allows for all the same clue formats as before.

Canonelis commented 3 years ago

If you're busy I could provide a fairly exhaustive list of test cases. Anything I can do to help u add this to the project?

Canonelis commented 3 years ago

Did some rigorous testing on it, found one flaw. Generated 2000 clues that should work and they did. Generated 5000 clues that shouldn't work and they didn't. This is ready.

Canonelis commented 3 years ago

This would be good to add pretty soon since you have so many foreign decks. Right now the characters it allows in clues is fairly arbitrary. If the character's code mod 256 is in the range of A-Z or a-z or À-ÿ then it accepts it, otherwise it rejects it.

I've played a few games with it now and I think it's done.

Canonelis commented 3 years ago

Here are 2 near legit clues(but not legit).txt legit clues.txt files you can copy and paste from. They each were randomly generated and filtered by the regular expression ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$ So with the allowed character sets, it gets pretty weird, but for testing purposes it worked great. There are the numbers 0-9 in many other languages, so I included them as well which is why you might not see a normal number in each clue. For displaying and logging the clue, however, it puts it in as a normal digit.

Canonelis commented 3 years ago

Here are the submitted changes to the code.