Closed marmistrz closed 1 year ago
Can you elaborate why you would expect it to be considered an emoji?
Emoji are generally graphical symbols. The emoji selector character on its own is invisible. -- Enviado desde mi dispositivo Android con K-9 Mail. Por favor, disculpa mi brevedad.
My intent is to detect strings which represent sequences of emojis only (so that I can use a different font size). This means that I want to return YES for strings like:
๐๏ธ
๐๏ธ๐๏ธ๐๏ธ
๐ค๐ฟ
but NO for
something๐๏ธ
๐๏ธdfaffd๐๏ธs๐๏ธ
Also, I'd expect a YES answer for the example from the OP, '\\U0001f600\\ufe0f'
.
The natural approach to do this is
return all(emoji.is_emoji(c) for c in text)
which will return False
for the example from the OP. The fact that this doesn't work if a variation selector is used is far from being obvious and I can expect a lot of code using my buggy approach in the wild.
Since emoji.is_emoji
only works on the character level, I'll be happy to contribute a helper function that would properly recognize whether a string contains only emojis.
There is the new function analyze() [added in the current version] , it can split a string into non-emojj characters and multi-character emoji.
I think your use-case can be achieved with this:
all(isinstance(m.value, emoji.EmojiMatch) for m in emoji.analyze(my_string, non_emoji=True))
To understand how it works look at the output of:
list(m.value for m in emoji.analyze(my_string, non_emoji=True))
-- Enviado desde mi dispositivo Android con K-9 Mail. Por favor, disculpa mi brevedad.
Maybe we should have a warning in the Readme that explains that it is generally a bad idea to look at individual characters when dealing with emoji ๐ค
-- Enviado desde mi dispositivo Android con K-9 Mail. Por favor, disculpa mi brevedad.
Given how error prone it is, how about exposing this functionality in emoji
itself?
I guess we could add it. Do you have a good name for a function in mind?
Some ideas:
is_emoji_sequence
is_only_emoji
represents_only_emoji
pure_emoji
/ pure_emoji_sequence
/ is_pure_emoji_sequence
/ is_purely_emoji
If you have decided upon what the name should be, let me know and I'll submit a PR.
Sorry, I forgot about it.
Not 'sequence' because 'sequence' has a different meaning in the context of Unicode.org.
The others I don't really have a preference. pure_emoji
is short and , so maybe that one.
I used purely_emoji
because pure_emoji
might mislead the user that there are pure and impure emojis.
Consider the following unicode sequence, representing a single emoji.
๐๏ธ
.I would expect this code to print
True
twice. However, the actual behavior is:Is this the expected behavior?