TwitchLib / TwitchLib.Client

Client component of TwitchLib.
39 stars 61 forks source link

EmoteExtractor breaks on messages with emoji #259

Closed Hampo closed 11 months ago

Hampo commented 11 months ago

Sent the message One 😂 Two Kappa Three in Twitch: Twitch Chat

Set a breakpoint on the first Emote in the EmoteSet: Emote

You can see the emote.Name is Kapp instead of Kappa. Best guess is EmoteExtractor not respecting multi-byte characters, like emoji.

Hampo commented 11 months ago

Version tested on was 4.0.0-preview-bb8afa27cfa09f9d3c221969879b0adb43809afa

Hampo commented 11 months ago

Looking into it some, it appears Twitch uses multibyte encoding (UTF-16 maybe) and those are the indexes provided. The issue is the network traffic is decoded as UTF-8. You can use StringInfo's SubstringByTextElements to correctly parse the string. Example working code to remove emotes from a ChatMessage:

static string RemoveEmotes(ChatMessage msg)
{
    StringBuilder parsed = new(msg.Message.Length);
    StringInfo rawInfo = new(msg.Message);

    int startIndex = 0;
    foreach (Emote emote in msg.EmoteSet.Emotes.OrderBy(x => x.StartIndex))
    {
        parsed.Append(rawInfo.SubstringByTextElements(startIndex, emote.StartIndex - startIndex));
        parsed.Replace("  ", " ");

        startIndex = emote.EndIndex + 1;
    }
    if (startIndex < rawInfo.LengthInTextElements)
    {
        parsed.Append(rawInfo.SubstringByTextElements(startIndex));
        parsed.Replace("  ", " ");
    }

    return parsed.ToString();
}
Syzuna commented 11 months ago

Fixed #260