Open eastein opened 9 years ago
This may be something wrong with the irc module at 10.1
Try setting irc.client.ServerConnection.buffer_class = irc.buffer.LenientDecodingLineBuffer to avoid issues like this.
This unicode seems to successfully break
d̵́͢҉̩̟̜̹͔͇ẁ̠̣̞̞̪͘̕͞ĺ̦̼̙̳̬͞o̷̸̻̫̙̼̤͈͟͠c̶̡̬̖̮̯̳̱̟̕k̸̛̛̛̜̠͎̮͍͝s̢͙̖̘̦̼̻̟̕͠,̴͞҉̶̢̞̱̘̩̻ ̸̨͔͍͈̖͕͟͟͝h̴̥͍͕͇͕̜͈́͞e͏̴̮͖̟̖͔̗̲͝ ͏̰͚͙̹͕̬̼͘͞ç̴̵̞͔͎̳͕͟ͅo̡҉̨̥̘͇̱�
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/eastein/venvs/andreybot/lib/python2.7/site-packages/andrey_bot/run.py", line 191, in <module>
s.run()
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/mediorc/__init__.py", line 104, in run
self.client.ircobj.process_once(0.2)
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/client.py", line 244, in process_once
self.process_data(i)
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/client.py", line 201, in process_data
c.process_data()
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/client.py", line 572, in process_data
for line in self.buffer:
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/buffer.py", line 96, in lines
self.handle_exception()
File "/home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/buffer.py", line 94, in lines
yield line.decode(self.encoding, self.errors)
File "/home/eastein/venvs/andreybot/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 508-509: unexpected end of data```
Still running irc==10.1. Another similar problem... not the same though. I don't have the exact (seemingly invalid) unicode string that triggered this one.
Exception message from the most recent failure:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 508-509: unexpected end of data
I had modified /home/eastein/venvs/andreybot/local/lib/python2.7/site-packages/irc/buffer.py on the line before the call to line.decode
to print a repr of the line
variable. This is that repr:
':bjonnh[m]!bjonnhmatr@gateway/shell/matrix.org/x-ecjopyftwrirlcxi PRIVMSG #pumpingstationone :""The relentless pressure on TikTok ramped up further this week, with U.S. Secretary of State Mike Pompeo again claiming user data is sent to to China. \xe2\x80\x9cIt\xe2\x80\x99s not possible to have your personal information flow across a Chinese server,\xe2\x80\x9d he warned during a British media interview, suggesting that data would \xe2\x80\x9cend up in the hands of the Chinese Cmmunist Party,\xe2\x80\x9d which he characterized as an \xe2\x80\x9cevil empire.\xe2\x80'
The IRC protocol truncates without attention to character encoding, on a byteswise basis, to impose a maximum size of any message sent by a user to other users.
Here, an example of how smart quote 3 byte utf-8 encoded data will either decode appropriately or crash in this way, depending on where truncation occurred:
>>> print b'\xe2\x80'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/eastein/venvs/andreybot/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data
>>>
In my IRC client (irssi) showing the message, the end of the line is shown as:
an “evil empire.��
@asl2 recommended "passing either ignore or replace to the decoder would fix it", I think it would be appropriate to set errors='replace'
.
Example of that operating as expected instead of crashing:
>>> print b'an "evil empire.\xe2\x80'.decode("utf-8", errors='replace')
an "evil empire.�
Please prioritize this is really important for my sanity.
Traceback (most recent call last): File "./chronbot", line 304, in
s.run()
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/mediorc/init.py", line 101, in run
self.client.ircobj.process_once(0.2)
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/irc/client.py", line 261, in process_once
self.process_data(i)
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/irc/client.py", line 218, in process_data
c.process_data()
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/irc/client.py", line 575, in process_data
for line in self.buffer:
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/irc/buffer.py", line 94, in lines
self.handle_exception()
File "/home/eastein/newer_venv/local/lib/python2.7/site-packages/irc/buffer.py", line 92, in lines
yield line.decode(self.encoding, self.errors)
File "/home/eastein/newer_venv/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x94 in position 174: invalid start byte