aspineux / pyzmail

Pyzmail is a high level mail library for Python, providing functions to read, compose and send emails
59 stars 31 forks source link

pyzmail crashes when parsing a mail with badly encoded UTF-8 header #12

Open amikoren opened 8 years ago

amikoren commented 8 years ago

Hello aspineux,

I have a crash happenning at pyzmail, at some rare malformed mail file. It seems like a pyzmail mistreating such file.

Details: Python version:pyt 3.4.2 pyzmail - 1.0.3 Linux - Debian 8

$ grep version /usr/lib/python3.4/email/init.py version = '5.1.0'

Crash reason: If header can not be encoded (UTF-8 is badly encoded), Compat32._sanitize_header() at _policy_base.py doesn't return a string, but an instance of class email.header.Header That causes pyzmail to crash when trying to activate Header.startswith()

Reproduce: Take the attached file, and run from python3: import pyzmail pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())

Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 774, in message_from_bytes return PyzMessage(email.message_from_bytes(s, _args, *_kws)) File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 634, in init self.mailparts=get_mail_parts(self) File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 482, in get_mail_parts mailparts.append(MailPart(part, filename=filename, type=type, charset=charset, content_id=part.get('Content-Id'), description=part.get('Content-Description'), disposition=disposition, is_body=parts.get(part, False))) File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 98, in init if self.content_id.startswith('<') and self.content_id.endswith('>'): AttributeError: 'Header' object has no attribute 'startswith'

mail_utf8_error.zip

srault95 commented 8 years ago

No problem for me if use:

# pyzmail 1.0.3 - Python 3.4.3 - windows 7

>>> msg = pyzmail.message_from_file(open('mail_utf8_error'))

>>> msg.as_string()
'Return-Path: <a@s.com>\nReceived: from mp0-f70.google.com (mp0-f70.google.com [209.85.220.70])\n\tby i-sgcore01-poc-server.c.trusty-catbird-121621.internal (Postfix) with ESMTPS i
d 020B9408FA\n\tfor <e.k@i.net>; Fri, 13 May 2016 17:13:48 +0300 (IDT)\nReceived: by mp0-f70.google.com with SMTP id gw7so150349649pac.0\n        for <e.k@i.net>; Fri, 13 May 2016
07:13:47 -0700 (PDT)\nX-Original-Authentication-Results: mx.google.com;       spf=pass (google.com: domain of a@s.com designates 69.88.22.222 as permitted sender) smtp.mailfrom=a@s
.com\nX-Received: by 10.98.55.133 with SMTP id e127mr22852924pfa.81.1463148827242;\n        Fri, 13 May 2016 07:13:47 -0700 (PDT)\nX-Received: by 10.98.55.133 with SMTP id e127mr22
852810pfa.81.1463148826374;\n        Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived: from gproxy10-pub.mail.unifiedlayer.com (gproxy10-pub.mail.unifiedlayer.com. [69.88.22.222])\n
        by mx.google.com with SMTP id i10si24956178paz.90.2016.05.13.07.13.45\n        for <e.k@i.net>;\n        Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived-SPF: pass (google.c
om: domain of a@s.com designates 69.88.22.222 as permitted sender) client-ip=69.88.22.222;\nAuthentication-Results: mx.google.com;\n       spf=pass (google.com: domain of a@s.com d
esignates 69.88.22.222 as permitted sender) smtp.mailfrom=a@s.com\nReceived: (qmail 2115 invoked by uid 0); 13 May 2016 14:12:47 -0000\nReceived: from unknown (HELO cmgw4) (10.0.90
.85)\n  by g.m.u.com with SMTP; 13 May 2016 14:12:47 -0000\nReceived: from box1210.bluehost.com ([55.88.222.222])\n\tby cmgw4 with\n\tid tqCi1s00d4Z6XqA01qCly4; Fri, 13 May 2016 08
:12:47 -0600\nReceived: from [111.11.22.111] (port=61165 helo=LocalHost)\n\tby box1210.bluehost.com with esmtpsa (TLSv1:AES128-SHA:128)\n\t(Exim 4.86_2)\n\t(envelope-from <a@s.com>
)\n\tid 1b1DpX-0008LG-HL\n\tfor e.k@i.net; Fri, 13 May 2016 08:12:42 -0600\nMessage-ID: <14587200010E149D30960D6D7E58C4C297D360313E@SUNPHOR.COM>\nFrom: "Ms.A" <a@s.com>\nReply-To:
<a@s.com>\nTo: "E" <e.k@i.net>\nSubject: Re:Hi E,Greetings from S A\nDate: Fri, 13 May 2016 22:22:24 +0800\nMIME-Version: 1.0\nX-Priority: 3\nX-Mailer: Joinf MailSystem 8.0\nConten
t-Type: multipart/related;\n\ttype="multipart/alternative";\n\tboundary="Mark=_217952388210897619413514"\nX-Identified-User: {1094:bb.com:s1:s.com} {sentby:smtp auth 111.11.22.111
authed with a@s.com}\n\n\n--Mark=_2179523882108976194183049--\n\n--Mark=_217952388210897619413514\nContent-Type: image/jpg;\n\tname="=?utf-8?Q?=E5=95=86=E5=AF=8Clog.jpg?="\nContent
-Transfer-Encoding: base64\nContent-ID: =?utf-8?b?PMOJw4zCuMK7bG9nLmpwZ0A0MjUwMy42NDA2NjA2NDgxLjY1?=\n\n\n--Mark=_217952388210897619413514--\n'
amikoren commented 8 years ago

Thanks for checking, srault95. Maybe it's a windows-linux difference? At my Debian it happens with Python 3.4.3, pyzmail 1.0.3. Using pyzmail.message_from_file() on that file also raises an exception.

aspineux commented 8 years ago

Thank you amikoren, I can reproduce the problem and I will provide a fix soon.

srault95, you are using the "old python2" interface (aka the text interface)

msg = pyzmail.message_from_file(open('mail_utf8_error')) vs

msg = pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())

In "open('mail_utf8_error')" the content of the file is decoded using your local encoding, and then pyzmail and the mail library is working on a different set of data.

What amikoren is doing is more like this

msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('utf-8'))

And what you are doing is more like this

msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('cp1252'))

replace 'cp1252' with you local Windows encoding.