MailScanner / v5

MailScanner v5
GNU General Public License v2.0
186 stars 60 forks source link

milter: garbled content-type / mime boundaries with certain emails #638

Closed dneuhaeuser closed 1 year ago

dneuhaeuser commented 1 year ago

I'm using v5.4.5-3 in milter mode.

On certain emails I have a problem with mixed up Content-Type headers.

For example, in postfix the email arrives with this header:

Content-Type: multipart/mixed; boundary="Mon_Jan_9_15:10:21_2023_L1_14"

After scanning (nothing was filtered here) the email is forwarded to a downstream SMTP and the header lines arrive like this:

--Mon_Jan_9_15: 10:21_2023_L1_14 Content-Type: text/plain; charset="iso-8859-1"

So the order of the lines was swapped, the keyword "boundary" is missing. Maybe the receiving SMTP added the Content-Type like that because it was missing completely, I'm not 100% sure.

Because of this problem the mime boundaries are not recognized and the whole content (including attachments) is treated as text body.

Could this be a bug in MSMilter?

shawniverson commented 1 year ago

It wouldn't be a bug in MSMilter since it just grabs the mail for MailScanner to process, but maybe something up in MailScanner itself. It looks like that header should be included with that boundary, but something is missing? I don't think the lines got swapped, that is a different set of lines that also needs to be present for the boundary to make sense.

shawniverson commented 1 year ago

Any chance you get your hands on one of these emails before it enters MailScanner for us to look at closer?

dneuhaeuser commented 1 year ago

yes, I have one of these emails at hand. should I post here oder where? i would need to black out some information for privacy reasons...

msapiro commented 1 year ago

I have never seen anything like this. What you show looks like a boundary and a sub-part header with an extra space. I.e it should be

--Mon_Jan_9_15:10:21_2023_L1_14
Content-Type: text/plain; charset="iso-8859-1"

(no space between 15: and 10)

Part of the issue may be the boundary itself. That format is quite unusual. While RFCs 2045 and 2046 seem to allow any quoted string of ascii characters as a boundary, I've never seen one that looks like a time stamp in that way.

Here's an example of main and subpart headers from a typical message.

Content-Type: multipart/mixed; boundary="000000000000c7d43d05f1c9f9ff"

--000000000000c7d43d05f1c9f9ff
Content-Type: multipart/alternative; boundary="000000000000c7d43a05f1c9f9fd"

--000000000000c7d43a05f1c9f9fd
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

--000000000000c7d43a05f1c9f9fd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

--000000000000c7d43a05f1c9f9fd--

--000000000000c7d43d05f1c9f9ff
Content-Type: application/msword; name="ESL Rpt_CAC_01-21-23.doc"
Content-Disposition: attachment; filename="ESL Rpt_CAC_01-21-23.doc"
Content-Transfer-Encoding: base64

--000000000000c7d43d05f1c9f9ff
Content-Type: text/plain; charset="utf-8"; name="ESL Rpt_CAC_01-21-23.txt"
Content-Disposition: attachment; filename="ESL Rpt_CAC_01-21-23.txt"
Content-Transfer-Encoding: 8bit

--000000000000c7d43d05f1c9f9ff--

Can you provide the main Content-Type: header and the sub-part headers as above for both the incoming and the garbled message.

dneuhaeuser commented 1 year ago

I also stumbled across this boundary style... but I have no influence on that, it seems to be sender-specific.

The incoming headers look like this:

... MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Mon_Jan_9_15:10:21_2023_L1_14" X400-Content-Identifier: 2023010915101947 Priority: normal

--Mon_Jan_9_15:10:21_2023_L1_14 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: quoted-printable

Here the text body follows

--Mon_Jan_9_15:10:21_2023_L1_14 Content-Type: application/octet-stream; name="ABC1" Content-Disposition: attachment; filename="ABC1"; Size=348 Content-Transfer-Encoding: base64

DATA OF 1ST ATTACHMENT

--Mon_Jan_9_15:10:21_2023_L1_14 Content-Type: application/octet-stream; name="XYZ2" Content-Disposition: attachment; filename="XYZ2"; Size=600050 Content-Transfer-Encoding: base64

DATA OF 2ND ATTACHMENT

--Mon_Jan_9_15:10:21_2023_L1_14-- ----------EOF---------

On the downstream SMTP it arrives like this: (the main mime header is missing completely here, it is the sub-header of text body I was seeing in the first place)

... X400-Content-Identifier: 2023010915101947 X-ABC-MailScanner: Found to be clean --Mon_Jan_9_15: 10:21_2023_L1_14 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-Path: sender@server.com X-MS-Exchange-Forest-EmailMessageHash: 82A4ED9C MIME-Version: 1.0

Here the text body follows

--Mon_Jan_9_15:10:21_2023_L1_14 Content-Type: application/octet-stream; name=ABC1" Content-Disposition: attachment; filename=ABC1"; =09Size48 Content-Transfer-Encoding: base64

DATA OF 1ST ATTACHMENT

--Mon_Jan_9_15:10:21_2023_L1_14 Content-Type: application/octet-stream; name=XYZ2" Content-Disposition: attachment; filename=XYZ2"; =09Size`0050 Content-Transfer-Encoding: base64

DATA OF 2ND ATTACHMENT

--Mon_Jan_9_15:10:21_2023_L1_14-- ------- EOF --------

msapiro commented 1 year ago

We don't need to see any actual content or headers other than Content- ones. Just the main Content- and MIME-Version: headers and all the boundaries and subpart headers should be enough. If it's easier, you can zip the original and send it to mark@msapiro.net, and I'll investigate and report.

msapiro commented 1 year ago

What you posted while I was typing should be enough.

msapiro commented 1 year ago

I can duplicate the problem.

The issue is in the original message we have

Content-Type: multipart/mixed;
boundary="Mon_Jan_9_15:10:21_2023_L1_14"

Note that boundary= is a continuation of Content-Type: but is not indented. If I fix the message by changing it to

Content-Type: multipart/mixed;
 boundary="Mon_Jan_9_15:10:21_2023_L1_14"

There is no issue in processing.

The bottom line is this is a defective message composed by a broken MUA. Is there a User-Agent: or other header identifying the MUA?

This is not a MailScanner issue per se, although it may be able to detect this defect and handle it more gracefully.

dneuhaeuser commented 1 year ago

actually there is a blank space in the line before "boundary=". (lost when pasting here)

unfortunately no user-agent info in the header.

dneuhaeuser commented 1 year ago

just found out that it is related to the setting "Sign Clean Messages". I had that on "yes" with a simple inline.sig.txt When I switch this to "no" the email arrives perfectly.

So perhaps the function for inserting this text has a problem with the unusual mime boundary?

msapiro commented 1 year ago

Yes, it is related to Sign Clean Messages and the specific boundary.

If I send

From: mark@msapiro.net
To: mark@msapiro.net
Subject: test
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="Mon_Jan_9_15:10:21_2023_L1_14"
X400-Content-Identifier: 2023010915101947
Priority: normal

--Mon_Jan_9_15:10:21_2023_L1_14
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: quoted-printable

Here the text body follows

--Mon_Jan_9_15:10:21_2023_L1_14
Content-Type: application/octet-stream; name="ABC1"
Content-Disposition: attachment; filename="ABC1"; Size=348
Content-Transfer-Encoding: base64

REFUQSBPRiAxU1QgQVRUQUNITUVOVAo=

--Mon_Jan_9_15:10:21_2023_L1_14
Content-Type: application/octet-stream; name="XYZ2"
Content-Disposition: attachment; filename="XYZ2"; Size=600050
Content-Transfer-Encoding: base64

REFUQSBPRiAyTkQgQVRUQUNITUVOVAo=

--Mon_Jan_9_15:10:21_2023_L1_14--

I get with some headers removed

From: mark@msapiro.net
To: mark@msapiro.net
Subject: test
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="Mon_Jan_9_15:10:21_2023_L1_14"
X400-Content-Identifier: 2023010915101947
Priority: normal
Message-Id: <20230109224544.CC4B93400ED@msapiro.net>
Date: Mon,  9 Jan 2023 14:45:44 -0800 (PST)
X-msapiro-MailScanner-ID: CC4B93400ED.AB44B
X-msapiro-MailScanner: Found to be clean
X-msapiro-MailScanner-SpamCheck: not spam, SpamAssassin (cached,
    score=-0.621, required 6, ALL_TRUSTED -1.00, NO_DNS_FOR_FROM 0.38)
X-msapiro-MailScanner-From: mark@msapiro.net
X-Spam-Status: No
--Mon_Jan_9_15:10:21_2023_L1_14
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: quoted-printable

Here the text body follows

--=20
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--Mon_Jan_9_15:10:21_2023_L1_14
Content-Type: application/octet-stream; name="ABC1"
Content-Disposition: attachment; filename="ABC1"; Size=348
Content-Transfer-Encoding: base64

REFUQSBPRiAxU1QgQVRUQUNITUVOVAo=

--Mon_Jan_9_15:10:21_2023_L1_14
Content-Type: application/octet-stream; name="XYZ2"
Content-Disposition: attachment; filename="XYZ2"; Size=600050
Content-Transfer-Encoding: base64

REFUQSBPRiAyTkQgQVRUQUNITUVOVAo=

--Mon_Jan_9_15:10:21_2023_L1_14--

but if I change the : characters to . I get

rom: mark@msapiro.net
To: mark@msapiro.net
Subject: test
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="Mon_Jan_9_15.10.21_2023_L1_14"
X400-Content-Identifier: 2023010915101947
Priority: normal
Message-Id: <20230109225421.1318F3400ED@msapiro.net>
Date: Mon,  9 Jan 2023 14:54:21 -0800 (PST)
X-msapiro-MailScanner-ID: 1318F3400ED.A8D4B
X-msapiro-MailScanner: Found to be clean
X-msapiro-MailScanner-SpamCheck: not spam, SpamAssassin (not cached,
    score=-0.621, required 6, ALL_TRUSTED -1.00, NO_DNS_FOR_FROM 0.38)
X-msapiro-MailScanner-From: mark@msapiro.net
X-Spam-Status: No

--Mon_Jan_9_15.10.21_2023_L1_14
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: quoted-printable

Here the text body follows

--=20
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--Mon_Jan_9_15.10.21_2023_L1_14
Content-Type: application/octet-stream; name="ABC1"
Content-Disposition: attachment; filename="ABC1"; Size=348
Content-Transfer-Encoding: base64

REFUQSBPRiAxU1QgQVRUQUNITUVOVAo=

--Mon_Jan_9_15.10.21_2023_L1_14
Content-Type: application/octet-stream; name="XYZ2"
Content-Disposition: attachment; filename="XYZ2"; Size=600050
Content-Transfer-Encoding: base64

REFUQSBPRiAyTkQgQVRUQUNITUVOVAo=

--Mon_Jan_9_15.10.21_2023_L1_14--

The issue here is in the case with colons in the boundary, the first boundary comes immediately following the X-Spam-Status: No header with no header terminating blank line between as in the case without the colons.

This shouldn't be hard to fix, but I'll defer to Shawn for that.

shawniverson commented 1 year ago

This helps narrow it down substantially. I will lab up and run some tests myself and see if we can determine a fix.

dneuhaeuser commented 1 year ago

Probably the problem is not only related to specific MIME boundaries but to all body lines with colons.

Sign Clean Messages = yes seems to interpret lines with colons as headers, even in textbody. (see #640 for another example)

Is there any news yet how this can be fixed?

shawniverson commented 1 year ago

I think what is really happening is the newline is being left out between the header and initial body somehow. I'm dealing with medical issues right now but have this on my todo.

shawniverson commented 1 year ago

Hopefully this fixes it:

https://github.com/MailScanner/v5/pull/642

shawniverson commented 1 year ago

I'm not sure my "fix" will help with a MIME part that has colons. I may need to work on this some more. The prepending newline will appear after the MIME information, so I don't think this is the solution.

shawniverson commented 1 year ago

Yeah just did a test. This fix won't help with this issue.

shawniverson commented 1 year ago

Try the new PR please. It hopefully will fix both this issue and the other. I found the spot where the newline was being left out.

dneuhaeuser commented 1 year ago

650 fixes this one as well.

Great work!