forPelevin / gomoji

Helpful functions to work with emoji in Golang
MIT License
190 stars 28 forks source link

Added collect all function to find all emojis in a string without eliminating duplicates #10

Closed mastern2k3 closed 2 years ago

mastern2k3 commented 2 years ago

I really wanted to use gomoji but it's missing a way to collect all emojis in a string as-is, without ignoring repeating emojis.

I am building a service that analyzes emojis in strings and need a way to detect which emojis appear and how many times.

I also feel like this behavior is more suiting the name FindAll and the current behavior of FindAll should be relocated to a FindAllDistinct.

forPelevin commented 2 years ago

Hi mastern2k3!

Thank you for your contribution!

Yeah, I'll rename the FindAll method with the following significant changes. However, it seems overkill to break backward compatibility to rename a method name right now.

mastern2k3 commented 2 years ago

it seems overkill to break backward compatibility to rename a method name right now.

Strongly agree, will add tests soon.

mastern2k3 commented 2 years ago

When I added the tests I noticed I'm getting duplicate results because some emojis were getting registered from both

    gr := uniseg.NewGraphemes(s)
    for gr.Next() {
        if em, ok := emojiMap[gr.Str()]; ok {
            emojis[gr.Str()] = em
        }
    }

and

    for _, r := range s {
        if em, ok := emojiMap[string(r)]; ok {
            emojis[string(r)] = em
        }
    }

I didn't quite get the difference, it looks like uniseg.NewGraphemes(s) picks up all emojis but some just don't appear correctly from gr.Str() and don't hit the correct map keys.

forPelevin commented 2 years ago

I didn't quite get the difference, it looks like uniseg.NewGraphemes(s) picks up all emojis but some just don't appear correctly from gr.Str() and don't hit the correct map keys.

Yep, you're right. Some emojis are invisible for uniseg.NewGraphemes(s) and some for for range by runes. So that's why we use both approaches here.

forPelevin commented 2 years ago

hey @mastern2k3,

It's been a while since our last interaction. Is everything good? Does my suggestion work for you?

mastern2k3 commented 2 years ago

Hey @forPelevin, all is well.

Looks like your suggestion should work.

I'm investigating something different, namely - why doesn't emojiMap[gr.Str()] hit for all emojis? Looks like this happens only for a specific set of emojis.

They get picked up by the uniseg.NewGraphemes(s) iterator, they also appear as emojis when you print them as gr.Str(), but when put through emojiMap it doesn't hit.

forPelevin commented 2 years ago

Hey @mastern2k3.

I'm investigating something different, namely - why doesn't emojiMap[gr.Str()] hit for all emojis? Looks like this happens only for a specific set of emojis.

If you print them by bytes, you'll see hidden bytes. That's why they cannot be found in emojiMap. Unfortunately, there is no obvious way to pre-generate all the emojis, so we need to double-check everything in runtime.

forPelevin commented 2 years ago

Hi @mastern2k3

Could you make changes from previos message and I will merge the pull request, please?

mastern2k3 commented 2 years ago

@forPelevin I fixed it :+1:

forPelevin commented 2 years ago

Nice job! Thank you for the PR. I've merged it.