CodedOre / NewCaw

Development on Cawbird 2.0
GNU General Public License v3.0
97 stars 5 forks source link

Bugs when displaying text from certain tweets #50

Closed CodedOre closed 2 years ago

CodedOre commented 2 years ago

When displaying certain tweets from Twitter, the text is not properly displayed.

This can happen in two ways. First, mentions, links, etc. are positioned wrong and therefore some text is cut off. Secondly, the text is not displayed at all since GTK finds invalid UTF characters and therefore displays nothing.

I have yet to seen this with Mastodon posts, and all posts that have this issue use some emojis, so I guess the issue is related to emojis as well.

A good test case for this are the tweets from @TwitterDev.

IBBoard commented 2 years ago

GTK/Pango is slightly unhelpful in that it tends to just not show anything in these situations.

From what I remember of dealing with this in Cawbird (which does it in C!), you're not doing something like indexing by byte instead of unicode character, are you? Or Twitter could be counting differently. Because that would cause it to mangle non-ASCII characters like emoji. I've had to use g_utf8_next_char() at times (src).

CodedOre commented 2 years ago

From what I remember of dealing with this in Cawbird (which does it in C!), you're not doing something like indexing by byte instead of unicode character, are you?

Looking at my code, I probably do... Right now I just slice the string.. Guess that explains the first issue. Means I need to switch that out with an method that uses g_utf8_next_char().

That leaves the second one. GTK does print out the text in the terminal, so I can see that the emojis are the issue. For example, the ⬇️ icon from this tweet is given to \xe2, which is read as wrong... Edit: I just noticed that it's most of the time emojis at the end of the text, so I guess the explanation is that due to our wrong counting we're cutting of the text too early. That means it should resolve itself with the other issue.

CodedOre commented 2 years ago

Turns out the solution was relatively simple, as GLib conveniently provides string.index_of_nth_char to retrieve the index we need.

Should be resolved now.