Closed seshrs closed 4 years ago
I added two tests, one with plain emoji, then other using the Markdown rendered with emoji bugfix/61-emoji
.
It would be nice to avoid using the chardet
library altogether. This library is the root cause of Issue #46, too.
One idea would be to check if any characters are outside the range [0, 127]
. Then, set the encoding to either ascii
or utf8
.
Tl;dr
It looks like
chardet
doesn't always get the encoding right — in a message containing emoji,chardet
reported (with low confidence) that the message was encoded in Windows-1252. Python throws an exception when trying to encode a string with emoji to that charset.I don't know if there's a good fix. Can we assume that users are responsible for specifying an encoding if it's not UTF-8? (That would remove our reliance on
chardet
.)The bug
Steps I followed on my Mac:
mailmerge_template.txt
:Hi 😀
... File "/Users/seshrs/Documents/Git/mailmerge/mailmerge/template_message.py", line 76, in _transform_encoding part.set_charset(encoding) File "/Users/seshrs/Documents/Git/mailmerge/env/lib/python3.7/site-packages/future/backports/email/message.py", line 322, in set_charset self._payload = charset.body_encode(self._payload) File "/Users/seshrs/Documents/Git/mailmerge/env/lib/python3.7/site-packages/future/backports/email/charset.py", line 403, in body_encode string = string.encode(self.output_charset) File "/usr/local/bin/../Cellar/python/3.7.4_1/bin/../Frameworks/Python.framework/Versions/3.7/lib/python3.7/encodings/cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f600' in position 3: character maps to
(Pdb++) detected {'encoding': 'Windows-1252', 'confidence': 0.5334615384615384, 'language': ''}
(The call to
part.set_charset(encoding)
eventually reaches this line in thepython-future
code that executes something like the above.)