aspineux / pyzmail

Pyzmail is a high level mail library for Python, providing functions to read, compose and send emails
60 stars 31 forks source link

issue extracting attachments with improper boundary #10

Open mlaferrera opened 8 years ago

mlaferrera commented 8 years ago

I recently ran across an email where the attachment could not be extracted using pzymail. The issue appears to be with the parsing of boundaries and over relying on them to extract content. Below is an example that will not extract the attachment.

To: test@testuser.com
Subject: Testing
Message-ID: <abc123@testuser.com>
Return-Path: bounce@testuser.com
Date: Tue, 06 Oct 2015 11:25:00 +0000
From: "testing" <testing@testsource.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; charset="UTF-8"; boundary="b1_000001"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline

--b1_000001
Content-Type: multipart/alternative;
    boundary="b3_000001"

--b3_000001
Content-Type: text/plain; format=flowed; charset="UTF-8"
Content-Transfer-Encoding: 8bit

testing

--b3_000001
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 8bit

<html>
<head>
</head>
<body>
testing
</body>
</html>

--b3_000001--
--b1_000002
Content-Type: application/octet-stream;
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="file.txt"

VGhpcyBpcyBhIHRlc3QgZmlsZS4K

--b1_000002--
giovino commented 8 years ago

I typically use ThunderBird as my baseline to answer the question.. "should this incorrectly formatted email parse correctly". In this case Thunderbird does not parse out the file.txt attachment either.

Of course we could test a dozen other email clients but this is at least a single data point.