MetricsGrimoire / MailingListStats

Mailing List Stats is a command line based tool used to analyze mboxes
http://metricsgrimoire.github.com/MailingListStats/
GNU General Public License v2.0
38 stars 24 forks source link

Error parsing email with patches attached #1

Open gpoo opened 12 years ago

gpoo commented 12 years ago

When mlstats parses emails with patches attached misidentify them as different emails.

For instance, a MIME message that contains the following parts would fail:

From random@hacker.com  Fri Jul 13 12:02:17 2012
Return-Path: <random@hacker.com>
Received: from localhost (localhost.localdomain [127.0.0.1])
    by some.server.org (program) ...;
    [Date]
Subject: Patch

--=-T31lODH7164VRnQIlwfA
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hello, here attached my patch.
... more lines of text ...

--=-T31lODH7164VRnQIlwfA
Content-Disposition: attachment; filename="0001-some-format.patch"
Content-Type: text/x-patch; name="0001-some-format.patch"; charset="UTF-8"
Content-Transfer-Encoding: 7bit

From cb83...8547 Mon Jul 13 00:00:00 2012    <----- Line uncorrected parsed
From: Random Hacker <randome@hacker.com>
Date: Fri, 13 Jul 2012 13:45:07 +0800
Subject: [PATCH] Some format patch

Commit message ... 

---
 src/dir/file.py |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
[...]

--=-T31lODH7164VRnQIlwfA--

This message is incorrectly parsed as two separate message when it is only one. mlstats will start the second message in the beginning of the format patch. In the statistics will look like there were less messages processed and the content stored is incomplete (bad when studying patch contributions).

gpoo commented 12 years ago

For testing attachments, there is a test file available. It contains one message with nested attachments (51 parts in total).

ftp://ftphost.cac.washington.edu/imap/mime-examples/torture-test.mbox (written by Mark Crispin, father of IMAP protocol).