cinchrb / cinch

The IRC Bot Building Framework
http://www.rubydoc.info/gems/cinch
MIT License
1k stars 180 forks source link

Use ISO-8859-1 instead of CP1252 for sending messages. #176

Closed KamilaBorowska closed 10 years ago

KamilaBorowska commented 10 years ago

Apparently, some IRC clients don't support CP1252 messages, from which I confirmed HexChat and Quassel IRC, but it's likely there are more IRC clients that don't support CP1252 (by the way, CP1252 is not ISO-8859-1).

Following the robustness principle, the IRC encoding was updated to send messages using ISO-8859-1, but read them using CP1252. ISO-8859-1, unlike CP1252 is supported practically everywhere.

This will cause a slight incompatibility issue with the clients that don't support UTF-8, but support CP1252 (not just ISO-8859-1). In my opinion it's better to make those rare messages readable for modern IRC clients, rather than old versions of IRC clients such as mIRC 6.16 released 10 years ago.

dominikh commented 10 years ago

CP1252 was chosen over ISO-8859-1 after considerable research. I'm a bit surprised to hear that HexChat isn't supposed to support it, after all I took major inspiration from XChat (which I believe is what HexChat forked from?), which uses CP1252.

I can see the point in targeting modern clients instead of old ones, but at that point one has to wonder if a dual encoding scheme makes sense at all, since virtually all modern clients should support (and default to) UTF-8, or am I mistaken?

KamilaBorowska commented 10 years ago

For writing, always using UTF-8 is definitely acceptable. However, for reading, I would prefer the hybrid encoding to stay. XChat and mIRC in default configurations still use Hybrid encoding, even if they can be configured to only use hybrid encoding for reading.

From what I see, the issue I mention is a bug introduced in HexChat 2.10 (I need to check that however, it was an user of IRC bot, not me, who reported this, I use Quassel IRC myself, and my operating system has HexChat 2.8 where it works), probably unnoticed, because usually IRC clients use ISO-8859-1 when writing to an IRC socket, not Windows-1252.

dominikh commented 10 years ago

As for writing, http://xchat.org/encoding/ is the document I based my decisions on, which states CP1252 is used for writing, when possible.

I'll sleep over this a day or two and see if we can switch to UTF-8 for writing and the CP1252/UTF-8 hybrid for receiving.

dominikh commented 10 years ago

Let's

I believe that in 2014, it's more important to support clients that only understand UTF-8 than it is to support clients that only understand CP1252 (or even ISO-8859-1).

If you could adjust the PR, I'll merge it.

This will also fix #173.

dominikh commented 10 years ago

@xfix I just noticed that you made the requested changes, but didn't comment on this PR, so I didn't get any notification about the added changes :(