Stefan4472 / simple-search-engine

GNU General Public License v3.0
0 stars 0 forks source link

Investigate unicode handling #16

Closed Stefan4472 closed 2 years ago

Stefan4472 commented 2 years ago

I thought this might be a problem in the Tokenizing process (the original implementation was ASCII-only), but looking at the code more closely there don't appear to be any problem spots.

The only relevant code is:

return (
            'a' <= char <= 'z' or
            'A' <= char <= 'Z' or
            '0' <= char <= '9'
        )

which works in Unicode as well as Ascii.