domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
966 stars 210 forks source link

[BUG] Forensic parser creates malformed samples #403

Closed rubeste closed 4 months ago

rubeste commented 1 year ago

A problem resides within the parser regarding Forensic reports. I believe it is caused by a specific line within the __init__.py file on line: 844

843                sample = parts[1].lstrip()
844                sample = sample.replace("=\r\n", "")
845                logger.debug(sample)

The sample.replace("=\r\n", "") will replace any substring that follows the substring =\r\n. I do not know what the reason for this is but it will corrupt the headers in situations like the following:

DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple;
    s=someselector; d=example.com; t=1682575383;
    h=From:To:Content-Transfer-Encoding:Content-Type:Date:Subject:Message-ID:Feedback-ID;
    bh=zfejcErei4tABSafVcg3dk/KJNlwRvcPheEr7GEr5iE=;
    b=ex1sX2dc+mftIS3ZCE5qChoi/BacZ3d3vg/PEsL0Aum7TIIiPy39peE4oUyWJ
    2clBCRx3vGRp2iBwahauhgjtS28+Eh/U0mepvXRsWo21EymiZm3sA3K
    CnMEhozHrJ8ttrHhitaijoGn/1gfkXvlk=
From: bob@example.com

Because the DKIM signature has a signature in Base64 it is possible for a trailing = character before a new line \r\n. This will create the following: ... 1gfkXvlkFrom: bob@example.com Which is a invalid signature and has corrupted the From header which the parser needs to form a forensic report.

I would like to know if this line is actually needed or can be removed/modified.

bencomp commented 11 months ago

I have no experience with forensic reports, and certainly not in combination with parsedmarc. But this looks like a bug to me. I cannot tell from the code in context for what scenarios it should be used, so I don't know the answer to your question.

nhairs commented 6 months ago

Is this causing an error or incorrect output?

It does indeed feel like a bug with parsing so it would be good if we can add tests to ensure that we are correctly parsing results.

rubeste commented 6 months ago

I believe it will cause an error during the parsing of the Forensic report. This does not crash the program, but it will skip this report as a malformed report.

nhairs commented 6 months ago

In case anyone else is looking at this, the line numbers in the OP are now incorrect. As of 2024-01-02 the =\r\n replace code is on line 866.

nhairs commented 6 months ago

~It doesn't look like we have a sample email that hits this behaviour making it difficult to replicate. @rubeste would you be willing to provide a sample email?~

~Nevermind it looks like I can use one provided on #64~ This sample in #64 is only the body, not the full email with headers (which is where the problem is making itself visible). I've managed to "funge" it by combining the example in the OP with the sample in #64, but I wouldn't trust it.

Which means if you'd be willing provide a sample @rubeste that would be very helpful.

nhairs commented 6 months ago

That said, it sounds like we don't have a sample for which the removal of =/r/n would be intentional. i.e. we can't just alter this line because we don't know why it exists.

I notice that in the commit where this code was added @seanthegeek appears to also have private samples. The commit is quite old, but it would be super handy if @seanthegeek could share a sample for why this replace statement was added so we can make sure we don't break parsing of it.

rubeste commented 6 months ago

I'll provide a redacted version of the report once I'm able to. Should be within a day.

rubeste commented 6 months ago

Unfortunately the mailbox that collects all the dmarc reports does not contain any forensic records anymore. Furthermore, I was not able to find the report in question. The only thing you can use is the snippet in the origional post.

seanthegeek commented 6 months ago

I can't remember why I did that, so I removed that line in the branch for the upcoming release.