Investigate unicode handling

I thought this might be a problem in the Tokenizing process (the original implementation was ASCII-only), but looking at the code more closely there don't appear to be any problem spots.

The only relevant code is:

return (
            'a' <= char <= 'z' or
            'A' <= char <= 'Z' or
            '0' <= char <= '9'
        )

which works in Unicode as well as Ascii.

Stefan4472 / simple-search-engine

Investigate unicode handling #16