gmac / gemoji-parser

The missing helper methods for GitHub's gemoji gem.
MIT License
72 stars 9 forks source link

gemoji-parser

The missing helper methods for GitHub's gemoji gem. This utility provides a parsing API for the Emoji corelib (provided by gemoji). Parser performs the transformation of emoji symbols between unicode characters (😃), token strings (:smile:), and emoticons (:-D); and may perform arbitrary replacement of emoji symbols into custom display formats (such as image tags). Internally, highly-optimized regular expressions are generated and cached for efficient parsing.

Installation

Add this line to your application's Gemfile:

gem 'gemoji-parser'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install gemoji-parser

To run tests:

$ bundle exec rake spec

Usage

Tokenizing

The tokenizer methods perform basic conversions of unicode symbols into token symbols, and vice versa.

EmojiParser.tokenize("Test 🙈 🙊 🙉")
# "Test :see_no_evil: :speak_no_evil: :hear_no_evil:"

EmojiParser.detokenize("Test :see_no_evil: :speak_no_evil: :hear_no_evil:")
# "Test 🙈 🙊 🙉"

Symbol Parsing

Use the symbol parser methods for custom transformations. All symbol parsers yield Emoji::Character instances into the parsing block for custom formatting.

Unicode symbols

EmojiParser.parse_unicode("Test 🐠") do |emoji|
  %Q(<img src="https://github.com/gmac/gemoji-parser/raw/master/#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
end

# 'Test <img src="https://github.com/gmac/gemoji-parser/raw/master/unicode/1f420.png" alt=":tropical_fish:">'

Token symbols

EmojiParser.parse_tokens("Test :tropical_fish:") do |emoji|
  %Q(<img src="https://github.com/gmac/gemoji-parser/raw/master/#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
end

# 'Test <img src="https://github.com/gmac/gemoji-parser/raw/master/unicode/1f420.png" alt=":tropical_fish:">'

Emoticon symbols

EmojiParser.parse_emoticons("Test ;-)") do |emoji|
  %Q(<img src="https://github.com/gmac/gemoji-parser/raw/master/#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
end

# 'Test <img src="https://github.com/gmac/gemoji-parser/raw/master/unicode/1f609.png" alt=":wink:">'

See emoticons output for the default emoticon set.

All symbol types

Use the parse method to target all symbol types with a single parsing pass. Specific symbol types may be excluded using options:

EmojiParser.parse("Test 🐠 :scream: ;-)") { |emoji| "[#{emoji.name}]" }
# 'Test [tropical_fish] [scream] [wink]'

EmojiParser.parse("Test 🐠 :scream: ;-)", emoticons: false) do |emoji|
  "[#{emoji.name}]"
end
# 'Test [tropical_fish] [scream] ;-)'

While the parse method is heavier to run than the discrete parsing methods for each symbol type (parse_unicode, parse_tokens, etc...), it has the advantage of avoiding multiple parsing passes. This is handy if you want parsed symbols to output new symbols in a different format, such as generating image tags that include a symbol in their alt text:

EmojiParser.parse("Test 🐠 ;-)") do |emoji|
  %Q(<img src="https://github.com/gmac/gemoji-parser/raw/master/#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
end

# 'Test <img src="https://github.com/gmac/gemoji-parser/raw/master/unicode/1f420.png" alt=":tropical_fish:"> <img src="https://github.com/gmac/gemoji-parser/raw/master/unicode/1f609.png" alt=":wink:">'

Lookups & File Paths

Use the find method to derive Emoji::Character instances from any symbol format (unicode, token, emoticon):

emoji = EmojiParser.find(🐠)
emoji = EmojiParser.find('see_no_evil')
emoji = EmojiParser.find(';-)')

Use the image_path helper to derive an image filepath from any symbol format (unicode, token, emoticon). You may optionally provide a custom path that overrides the gemoji default location (this is useful if you'd like to reference your images from a CDN):

EmojiParser.image_path('tropical_fish')
# "unicode/1f420.png"

EmojiParser.image_path('tropical_fish', '//cdn.fu/emoji/')
# "//cdn.fu/emoji/1f420.png"

Custom Symbols

Emoji

The parser plays nicely with custom emoji defined through the gemoji core. You just need to call rehash! once after adding new emoji symbols to regenerate the parser's regex cache:

Emoji.create('boxing_kangaroo') # << WHY IS THIS NOT STANDARD?!
EmojiParser.rehash!

Emoticons

Emoticon patterns are defined through the parser, and are simply mapped to an emoji name that exists within the gemoji core (this can be a standard emoji name, or a custom name that you have added). To add custom emoticon symbols:

# Alias a standard emoji name:
EmojiParser.emoticons[':@'] = :angry

# Create a custom emoji name, and alias it:
Emoji.create('bill_clinton')
EmojiParser.emoticons['=:o]'] = :bill_clinton

# IMPORTANT:
# Rehash once after adding any new symbols:
EmojiParser.rehash!

Emoticon Lookarounds

Emoticons are matched using lookaround patterns that separate the icon from its surrounding text. By default, an emoticon pattern must start a string, end it, and/or be surrounded by whitespace. To adjust these rules, you may modify the emoticon_lookaround settings. The following would allow emoticons to be wrapped in HTML tags:

# Allow HTML tags to surround emoticons, ex: "<p>:-)</p>"
EmojiParser.emoticon_lookaround[:behind] += '|>'
EmojiParser.emoticon_lookaround[:ahead] += '|<'
EmojiParser.rehash!

Shoutout

Thanks to the GitHub team for the gemoji gem, Matthew Rothenberg for emojitracker, and my esteemed colleague Michael Lovitt for the fantastic Rubular regex tool (it has been invaluable on this project).

🙈 🙊 🙉