carpedm20 / emoji

emoji terminal output for Python
Other
1.87k stars 273 forks source link

how to deal with emoji's like πŸ‘¨πŸ½β€πŸ‘©πŸ½β€πŸ‘§πŸ½ #256

Closed lsmith77 closed 1 year ago

lsmith77 commented 1 year ago

Python's unicode support with str in principle means that any multibyte character is a single character. but it appears that such an emoji breaks this rule, which is complicating dealing with such emoji's.

in my case I want to detect such an emoji and propose alternatives with other gender/skin tone compositions

k = 0
for i in [*"πŸ‘¨πŸ½β€πŸ‘©πŸ½β€πŸ‘§πŸ½"]:
    print(str(k) + ": " + i)
    k += 1

results in

0: πŸ‘¨
1: 🏽
2: ‍
3: πŸ‘©
4: 🏽
5: ‍
6: πŸ‘§
7: 🏽
cvzi commented 1 year ago

These kind of emoji are displayed together because of \u200d in Unicode, called Zero-width-joiner. At position 2 and 5 there is an invisible \u200d.

If you want to find emoji in a string as a single element, the regex library might be helpful with the \X-"grapheme" pattern:

k = 0
for i in regex.findall("\X", "testπŸ‘¨πŸ½β€πŸ‘©πŸ½β€πŸ‘§πŸ½string"):
    print(str(k) + ": " + i)
    k += 1
0: t
1: e
2: s
3: t
4: πŸ‘¨πŸ½β€πŸ‘©πŸ½β€πŸ‘§πŸ½
5: s
6: t
7: r
8: i
9: n
10: g

Finding them with this library is limited, the default skin color works, but others are not supported yet, see #204