alvinlindstam / grapheme

A python package for grapheme aware string handling
MIT License
108 stars 7 forks source link

Support grapheme aware substring testing #3

Closed alvinlindstam closed 7 years ago

alvinlindstam commented 7 years ago

The default x in y syntax works for strings in python, but only checks that the sequence of unicode code points is present in the string.

That could potentially cause issues, such as:

>>> se = "\U0001F1F8\U0001F1EA"
>>> es = "\U0001F1EA\U0001F1F8"
>>> ee = "\U0001F1EA\U0001F1EA"
>>> es_ee = es + ee
>>> print(se)
πŸ‡ΈπŸ‡ͺ
>>> print(es_ee)
πŸ‡ͺπŸ‡ΈπŸ‡ͺπŸ‡ͺ
>>> se in es_ee
True

We should provide something like grapheme.contains(string, substring) which only should return true if the sequence of graphemes in substring is in the sequence of graphemes in string.

alvinlindstam commented 7 years ago

Done in https://github.com/alvinlindstam/grapheme/commit/8c8a84a5ec31820f22a7c35a233edc4c65cf877f