Potential fix for incorrectly parsed emotes when a message is offset due to unicode characters.

mquin2003 commented 4 months ago

This is related to issues #413 and #437. I also experienced this issue with emotes getting parsed incorrectly, leading to a live-chat that would fail to convert.

In my instance, the conversion from a live chat would fail with runtime error: slice bounds out of range and would hang with no further status updates in the queue.

The message that caused the error is as follows: "You Fooool" LUL LUL LUL which is \u201cYou Fooool\u201d LUL LUL LUL as the quotation marks are unicode characters. I noticed that in the emoteFragments splice in ganymede/internal/utils/tdl.go, two of the three LUL emotes had duplicate pos1 and pos2 values, despite there being three "LUL" emotes in three different positions.

In the original live chat file, each LUL had the following positions, respectively: 13-15, 17-19, 21-23.

However, due to the offset caused by the unicode characters, findSubstringPositions was called as per the statement if slicedEmote != liveCommentEmote.Name on line 201, which gave the first emote the start and end positions: [17, 20].

After this corrected emote is appended, then on the second loop iteration, we're on the second LUL in the message, where its own (uncorrected) start position happens to be 17, the same value of the corrected start position of the previous emote. Because of this, despite the offset, slicedEmote == liveCommentEmote.Name, so a second emote of the same length is appended to emoteFragments since it thinks there's no offset.

This is how emoteFragments ended up with two emotes with identical start and end positions. After that, on line 259, fragmentText := tdlComment.Message.Body[emoteFragments[i-1].Pos2:emoteFragment.Pos1] fails with the runtime error: slice bounds out of range error.

This is because the starting position of the emote increments past the stored start position of the second emote.

My proposed solution simply added a default 'indicator' for lack of a better term, that stores whether the current message is offset. If it is, message_is_offset is set to true, so it will repeatedly call findSubstringPositions for the remainder of the emotes in the message, which in my instance, stored the emote locations with the correct, non-duplicated values: [{LUL 17, 20} {LUL 21, 24} {LUL 25, 28}].

Lastly, I wanted to thank you for such a helpful project. I hope the above information and that this PR in general is at least somewhat helpful. This is one of my first ever PRs, so I apologize if there's anything wrong with it.

Thank you so much for your time!

Zibbp commented 4 months ago

Tested with chat files from #437 and it works! No more panicing and the chat is converted. Thanks @mquin2003!

mquin2003 commented 4 months ago

Thanks for testing that @Zibbp! I'm happy to hear that it works.

Zibbp / ganymede

Potential fix for incorrectly parsed emotes when a message is offset due to unicode characters. #467