This resolves a hole in the recommended implementation of name validation from PEP-508 to avoid allowing non-ASCII values.
The regex (run with re.IGNORECASE) is:
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
- The implementation of `re.IGNORECASE` is ostensibly coercing the case of the input because it will match on non-ASCII.
- This matters for, at least, İ ([0x130](https://everythingfonts.com/unicode/0x0130)) and K ([0x212a](https://everythingfonts.com/unicode/0x212A)).
```pycon
>>> import re
>>> PEP508_NAME_RE = re.compile(r"^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$", re.IGNORECASE)
>>>
>>> bool(PEP508_NAME_RE.match("helloworld"))
True
>>> bool(PEP508_NAME_RE.match("İstanbul"))
True
>>> bool(PEP508_NAME_RE.match("Kelvin"))
True
>>> bool(PEP508_NAME_RE.match("Æolia"))
False
>>> bool(PEP508_NAME_RE.match("jalapeño"))
False
>>> bool(PEP508_NAME_RE.match("Beyoncé"))
False
>>> bool(PEP508_NAME_RE.match("naïve"))
False
PEP-508 says PyPI uses this regex...and indeed, it appears they do.
However, packaging.requirements.Requirement doesn't usere.IGNORECASE....so, even if you get such a package on to PyPI, it seems likely pip or another tool will reject it at some point.
Changes
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
Related
PR Checklist: