cinchrb / cinch

The IRC Bot Building Framework
http://www.rubydoc.info/gems/cinch
MIT License
1k stars 180 forks source link

Cinch encodes channel names with CP1252 when joining/leaving channels #173

Closed kochd closed 9 years ago

kochd commented 9 years ago

Hey there. It is not possible to join channels containing an umlaut.

bot = Cinch::Bot.new do
  configure do |c|
    c.nick = "logger#{rand(1..1000)}"
    c.server = "irc.quakenet.org"
    c.channels = ["#tierärzte"]
    c.plugins.plugins = [Logger]
  end # End Configure

Log say its joined but the logger is not in the channel. WHOIS says the logger is in "@#tier%E4rzte"

dominikh commented 9 years ago

Nicknames not allowing umlauts is a QuakeNet limitation, not a Cinch limitation:

$ telnet irc.quakenet.org 6667
[...]
NICK lögger1234
:servercentral.il.us.quakenet.org 433 * l :Nickname is already in use.

And the #tier%E4rzte issue seems to stem from the fact that your Ruby file is encoded in ISO-8859-1 encoding, not in UTF-8. Not only is using UTF-8 in general a good idea, QuakeNet also seems to expect it for channel names with special characters. E4 is the ISO-8859-1 encoding of the letter ä, so that seems to be QuakeNet's way of encoding channel names that contain bytes >128 that do not form valid UTF-8 sequences.

Neither of these things are Cinch-related issues, however.

dominikh commented 9 years ago

And @ isn't part of the channel name, it's indicating that the bot has +o (op) in the channel.

kochd commented 9 years ago

You dont have dust on your screen i dropped that issue because quakenet does not allow it.

#file -bi bot.rb
text/x-ruby; charset=utf-8

The file is utf-8 encoded as far as i can tell

On stdout i have:

[2014/08/20 23:58:40.100] >> :logger73556780!~cinch@XXXXXX JOIN #tierärzte

So it comes encoded in UTF-8 from the script but gets malformed somewhere else?

dominikh commented 9 years ago

Okay, not being able to join the right channel is a Cinch issue. In the default configuration, Cinch uses a mixed encoding of CP1252 and UTF-8. Messages that fit into CP1252 will be encoded as such, while remaining messages will be encoded as UTF-8. This matches the behaviour of popular clients such as X-Chat and maximized compatibility with non-UTF-8 clients, but only makes sense for actual chat messages. Right now, however, Cinch applies this to all messages, including that used to join channels, which is why the bot ends up in the CP1252 encoded name of the channel, not the UTF-8 encoded one.

A quick fix for this is to set c.encoding = "UTF-8", which forces UTF-8 and disables the use of CP1252.

kochd commented 9 years ago

Using c.encoding works. Thank you very much for your time.

dominikh commented 9 years ago

Closed via #176