EDCD / EDMarketConnector

Downloads commodity market and other station data from the game Elite: Dangerous for use with all popular online and offline trading tools.
GNU General Public License v2.0
991 stars 155 forks source link

EDDN: Possible issue with `str.lower()` in some locales #1648

Closed Athanasius closed 2 years ago

Athanasius commented 2 years ago

Ref: https://github.com/EDDiscovery/EDDiscovery/issues/3304

Some locales, at least Turkish, can turn a plain capital-I I into their unicode version of a lower-case i, rather than the ASCII version of it.

In general if we're sending anything out to EDDN (and possibly other APIs), and use .lower() on it we might end up sending bad data.

But Python doesn't seem to have any way to specify an alternate locale to such functions, and switching via locale.setlocale() is documented as not thread-safe. Even if it was in the sense of not pulling the rug out from under another thread's "in the air" operation, there's still the possbility of it causing the wrong locale to be used in another thread.

I'm not sure this is solvable for us/in Python outside of our own custom "we're assuming the input is plain ASCII" function for such things.

NB: Might not currently affect us in practice, but a quick grep shows some .lower() in plugins/eddn.py. Need to check the details to be sure, or if it's just "used locally for some comparisons". Even the latter might cause a bug for things like comparing a ship name to a mapping key.

klightspeed commented 2 years ago

It looks like Python uses its own unicode case conversion logic https://github.com/python/cpython/blob/7d8b69e1d1f125454d8cec81ff0dee72f2bef957/Objects/unicodectype.c#L209-L223, which has no dependency on any locale settings.

Athanasius commented 2 years ago

Yup, looks like we won't have any issues:

import locale

s = "Int"

locale.setlocale(locale.LC_ALL, '')
locale_startup = locale.getlocale(locale.LC_CTYPE)
print(f'{locale_startup=}')
s_startup = s.lower()

locale.setlocale(locale.LC_ALL, 'Turkish')
locale_turkish = locale.getlocale(locale.LC_CTYPE)
print(f'{locale_turkish=}')
s_turkish = s.lower()

locale.setlocale(locale.LC_ALL, (locale_turkish[0], 'UTF-8'))
locale_turkish_utf = locale.getlocale(locale.LC_CTYPE)
print(f'{locale_turkish_utf=}')
s_turkish_utf8 = s.lower()

print(s == s_startup)
print(s_turkish == s_startup)
print(s_turkish_utf8 == s_startup)
locale_startup=('English_United Kingdom', '1252')
locale_turkish=('Turkish_Turkey', '1254')
locale_turkish_utf=('Turkish_Turkey', 'utf8')
False
True
True