Closed lsmith77 closed 1 year ago
These kind of emoji are displayed together because of \u200d
in Unicode, called Zero-width-joiner. At position 2 and 5 there is an invisible \u200d
.
If you want to find emoji in a string as a single element, the regex library might be helpful with the \X
-"grapheme" pattern:
k = 0
for i in regex.findall("\X", "testπ¨π½βπ©π½βπ§π½string"):
print(str(k) + ": " + i)
k += 1
0: t
1: e
2: s
3: t
4: π¨π½βπ©π½βπ§π½
5: s
6: t
7: r
8: i
9: n
10: g
Finding them with this library is limited, the default skin color works, but others are not supported yet, see #204
Python's unicode support with
str
in principle means that any multibyte character is a single character. but it appears that such an emoji breaks this rule, which is complicating dealing with such emoji's.in my case I want to detect such an emoji and propose alternatives with other gender/skin tone compositions
results in