Raku / marketing

Marketing resources for Raku language
15 stars 8 forks source link

Unicode problems in IRC log and how this creates a bad image #1

Closed AlexDaniel closed 6 years ago

AlexDaniel commented 6 years ago

This issue is not about marketing resources, but I think that it's related enough for this repo.

We love @moritz and irclog, but its unicode support is garbage. I think it's really weird for us to be pushing for unicode adoption yet we can't get our own shit right.

We can try switching to http://colabti.org/irclogger/irclogger_log/perl6 but unfortunately it does not track #perl6-dev and #moarvm.

A few weeks ago irclog was even more broken, and then we had people complaining every day. Now it's only broken for fancy unicode stuff and I've lost all hope that it'll get better.

What I'm thinking is that the user can come to the channel, and someone will use a fancy emoji or something (for example, it's a common thing during the squashathon). It won't render in their terminal and they'll go to the irclog in a hope that their browser will do a better job, only to realize that it's broken there as well. Somehow I think that this will make us look bad.

What do you think?

moritz commented 6 years ago

I beg to differ. The Unicode support is not "garbage", it's limited to the BMP.

Feel free to provide a better service, we can switch the irc.perl6.org redirect to it.

toolforger commented 6 years ago

Am 01.10.2017 um 13:00 schrieb Moritz Lenz:

I beg to differ. The Unicode support is not "garbage", it's limited to the BMP.

Ah, that's why Emojis don't work - they start at U+1F600, the BMP ends at U+0FFFF. I wouldn't say "garbage", which is both an overgeneralization and rude, but it's definitely leaving room for improvement.

toolforger commented 6 years ago

Am 01.10.2017 um 00:19 schrieb Aleks-Daniel Jakimenko-Aleksejev:

I think it's really weird for us to be pushing for unicode adoption yet we can't get our own shit right.

Is the IRC service even using Perl6? Because if it is not, we just happen to be limited to existing service, like just about any other language community out there. If that service mangles non-BMP characters, then there is little any backend service can do about it.

Oh, and "pushing for X" does not mean "we have X". It means "we do not have X everywhere where we want it". So it is quite the opposite of weird if we don't have full Unicode support, yet push for it. The pushing should apply to the IRC service (or wherever the non-BMP Unicode gets mangled).

Regards, Jo

zoffixznet commented 6 years ago

Is the IRC service even using Perl6?

No.

If that service mangles non-BMP characters, then there is little any backend service can do about it.

It worked just fine for years until an upgrade a couple of months ago.

Oh, and "pushing for X" does not mean "we have X".

We're not just pushing for X, we're the leaders of the pack in Unicode support among all languages. It's a bit paradoxical to own that claim while we can't sort out proper rendering of our very own chat log…

Because if it is not, we just happen to be limited to existing service,

…and no one cares what we're actually using under the hood.

toolforger commented 6 years ago

Am 01.10.2017 um 14:11 schrieb Zoffix Znet:

We're not just pushing for X, we're the leaders of the pack in Unicode support among all languages.

cough ICU cough

Strictly speaking, it's not the language (C/C++/Java) that's doing the support here but a library, but the support is there if you want it.

BTW Java is pretty far in Unicode suport, too. It's just that nobody cares much about it in practice: 1) Java allows any Letter character in names, but most people want their code to be readable for an international audience, which means English and 7-bit ASCII is enough for that. 2) Java allows any Unicode character in String literals, and even that is not that important: If international messages are an issue, you want your code to be internationalizable, which essentially means that the actual message texts aren't in the code but in some text file. (Some projects code in a single national language, e.g. for government or similar purposes where all users speak the same language. Though nowadays, many governments deal with multiple official languages, so even that niche for non-ASCII characters in string literals is pretty narrow.)

It's a bit paradoxical to own that claim while we can't sort out proper rendering of our very own chat log…

Because if it is not, we just happen to be limited to existing service,

…and no one cares what we're actually using under the hood.

Those who criticize us for not (yet) fully supporting Unicode should care. And the previous answer to that is sufficient: We'll be happy to use any service that you can provide. Or an update to the server software that fixes what was broken. (Whatever software that is, actually I have no clue.)

zoffixznet commented 6 years ago

cough ICU cough [...]

I don't see how this wall of text is relevant to the discussion about fidelity of our IRC logs. This Issue is turning into another flatmap… and I see it has the same participants.

hits "unsubscribe"


toolforger commented 6 years ago

Am 01.10.2017 um 16:26 schrieb Zoffix Znet:

/cough/ ICU /cough/
[...]

I don't see how this wall of text is relevant to the discussion about fidelity of our IRC logs.

Just answering your claims.

This Issue is turning into another flatmap https://github.com/perl6/doc/issues/1428… and I see it has the same participants.

/hits "unsubscribe"/

Oh. Ad hominem.

zakame commented 6 years ago

Hi everyone, just a meta-comment about how we carry ourselves here.

I think the strong verbiage at the first post of this issue had everyone here off to a bad start. Perhaps in line with our common goal in this repo to market Perl 6 in the best way possible, it may be better to choose how to express frustration in issues on a much more sensible way, moving forward.

Cool your heads for a while.

AlexDaniel commented 6 years ago

FWIW if anybody wants to play with creating new loggers, take a look at matrix. For example, one of the problems with the current ilbot is that it misses a bunch of messages every time it leaves the channel. Just yesterday I was searching for a message and couldn't find it because ilbot pinged out when it was sent.

samcv commented 6 years ago

Is there an issue for ilbot? I couldn't find any here: https://github.com/moritz/ilbot/issues

Also which part of the code/module is incompatible with higher value Unicode codepoints? Or is it the database or something?

AlexDaniel commented 6 years ago

@samcv maybe this can answer some of your questions: https://irclog.perlgeek.de/perl6/2017-10-01#i_15240797

samcv commented 6 years ago

@AlexDaniel so the database needs to be converted to utf8mb4 I guess.

This article goes over a way to convert it without any downtime: https://railsmachine.com/articles/2017/05/19/converting-a-rails-database-to-utf8mb4.html

I'm not super knowledgable about SQL, but maybe someone else is and can do something with that?

AlexDaniel commented 6 years ago

For anyone stumbling upon this ticket months later, it is closed because the issue was actually resolved, and unicode characters show up correctly now. There are still issues with IRC colors showing up as numbers, but that's a whole different problem, and this ticket is indeed resolved.