carpedm20 / emoji

emoji terminal output for Python
Other
1.87k stars 273 forks source link

Replacing text emoji or not #278

Open cvzi opened 9 months ago

cvzi commented 9 months ago

Not really a bug, more an observation:

When using emoji.replace_emoji(str, '') to strip emoji from a string, it also replaces all text emoji. This might not be the expected behavior by the user.

For example:

> emoji.replace_emoji("pure emoji 😁 text variant © emoji variant ©️", "?")
pure emoji ? text variant ? emoji variant ?

So the © get removed, even though it is in text-variant. If someone is trying to remove emoji from a string, then they might not want to remove these symbols like © ® ↔

However several of these text-emoji are represented as text in one font and as emoji in another font. For example as I am writing this issue, the :right_arrow: \u27a1 ➡ is represented as a text emoji in Github's text editor, but it will be displayed as a emoji when the issue appears online.

(it still can be forced to text with the text-variant selector: \u27a1\ufe0e ➡︎ )

I don't see a solution to this, but the behaviour should be mentioned in the documentation.

One option for some users could be to replace emoji, but keep emoji with text-variant and force the text-variant by appending \uFE0E (text variant selector):

import emoji

def repl(e, d):
  if 'variant' in d and not e.endswith('\uFE0F'):
    # Emoji supports variants and emoji-variant (\uFE0F) is not selected
    if e.endswith('\uFE0E'):
      # Emoji is already in text-variant
      return e
    else:
      # Emoji is not in text-variant, add text-variant selector
      return e + '\uFE0E'
  else:
    # Emoji doesn't support variants, or emoji-variant is selected
    return ''

emoji.replace_emoji("smile 😁. copyright ©. Arrow-no-variant ➡.", repl)

Input is: "smile 😁. copyright ©. Arrow-no-variant ➡." Output is: "smile . copyright ©︎. Arrow-no-variant ➡︎."

lovetox commented 4 months ago

I don't think you should open the door and base any decision on things a font might do or not. I think that's a losing battle.

As i understand it the function is a helper tool for string manipulation. Presentation is on a different layer and should not be the business of this function.

As such i think its important that the method has a well defined behavior.

Now replacing a copyright sign which was added in 1993, where i guess nobody knew the word emoji, without having the option to turn it off, i would not think is expected from an emoji function.

As i understand the problem, there were symbols (not emojis) and later when they invented emojis instead of giving the copyright sign a new emoji codepoint, they invented variant selectors.

My first idea would be

Add a boolean argument like replace_text_variants

Codepoint + Emoji Selector -> Replace always (not dependent on replace_text_variants) Codepoint + Text Selector -> Only Replace if replace_text_variants=True Codepoint -> Only Replace if replace_text_variants=True

lovetox commented 3 months ago

i looked at the standard again, and it clearly marks every codepoint that is an Emoji, and the copyright sign 00A9 is according to the unicode data set marked as Emoji.

So even if a text variant selector is added, its still an emoji, the presentation is just different by fonts (using another glyph/image).

But we can assume that most users will not be experts of the unicode standard regarding emojis, so they probably expect text variants to be not replaced.

I still would go for a boolean argument, that leaves the choice to the user.