Zac-HD / hypothesmith

Hypothesis strategies for generating Python programs, something like CSmith
https://pypi.org/project/hypothesmith/
Mozilla Public License 2.0
94 stars 9 forks source link

Generate names which collide when NFKC-normalized #17

Open Zac-HD opened 2 years ago

Zac-HD commented 2 years ago

See this comment on Reddit and this blog post:

Be warned that Python always applies NFKC normalization to characters. Therefore, two distinct characters may actually produce the same variable name. For example:

>>> ª = 1 # FEMININE ORDINAL INDICATOR
>>> a # LATIN SMALL LETTER A (i.e., ASCII lowercase 'a')
1

Hypothesmith should deliberately violate this rule, to expose tools which compare identifiers as strings without correctly normalizing them first.