Message attempted to kill MailScanner every time when receiving a specific (html) mail

RobinR1 commented 2 years ago

Describe the bug Whenever I get a new notification of the social network 'Hoplr' by email, Mailscanner html disarming takes 100% CPU for a while and after a minute or so, it crashes resulting in the mail going into quarantine due to "MailScanner: Message attempted to kill MailScanner". This happens every time on such a notification mail. I don't see anything special in the mail that could cause this, and seem to contain only standard html. This is what I see in MailScanner logging:

Jan 23 08:42:18 hachiman MailScanner[17932]: New Batch: Scanning 1 messages, 198580 bytes
Jan 23 08:42:18 hachiman MailScanner[17932]: Virus and Content Scanning: Starting
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Allowlist refresh time reached
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Starting up MailWatch SQL Allowlist
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Read 1 allowlist entries
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Blocklist refresh time reached
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Starting up MailWatch SQL Blocklist
Jan 23 08:42:19 hachiman MailScanner[17932]: MailWatch: Read 0 blocklist entries
Jan 23 08:43:40 hachiman MailScanner[17932]: HTML disarming died, status = 1
Jan 23 08:43:40 hachiman MailScanner[17932]: Content Checks: Detected and have disarmed denialofservice tags in HTML message in 0D79D763B.A2B4F from bounce-md_30936634.61ed06d3.v1-94bb423a3416471fa213e362d560392e@mandrillapp.com
Jan 23 08:43:40 hachiman MailScanner[17932]: Quarantined message 0D79D763B.A2B4F as it caused MailScanner to crash several times

Expected behavior The HTML in the mail gets disarmed and the mail gets delivered correctly

Server :

OS: openSuse Leap 15.3
Mandatory Access Control Enforcement: Apparmor
MailScanner Version: 5.4.3
OS Version: 15.3 - kernel 5.3.18-59.37-default
Installation method: Package
Installation: New
Containerized: No

Additional context I have no idea how I can debug this. I'm using MailScanner with Postfix and MailScanner Milter. But I didn't find how to manually 'inject' such a quarantined message while MailScanner is running in Debug mode.. If I would get any wiser out of that ?

shawniverson commented 2 years ago

@RobinR1 There's a few ways we can approach this.

1) If hoplr is a free service I can try to sign up and run some tests 2) You can try to turn debug on in MailScanner and feed an email back through from quarantine using an smtp-aware tool (such as mailx with SMTP environment variable set) or trigger one from the service and see if you get anything additional we could use. 3) You can attach a redacted sample with the html content for analysis

RobinR1 commented 2 years ago

Hoplr is a free service, but meant to stay in contact with your local neighbourhood, so there needs to be a community for your local neighbourhood to enable you to use it. So possibly you won't even be able to sign up if hoplr was not activated for your neighborhood.

I've tried feeding the mail back through my mail system using mailx but for some reason then the mail is processed without problem, and I get a mail in my inbox (using gnome evolution) showing me the HTML code of the message instead of the rendered html. In the mail header I see that the header Content-Type: text/html; charset=utf-8 was changed into X-Invalid-Content-Type: text/html; charset=utf-8. Not sure where this header is changed, but I assume that is the reason why MailScanner now skips disarming it and passing it?

So I will now try your third suggestion and attach a redacted version of the mail, hoping that you can reproduce the problem on your system: D9E1A7887.AED0A.txt .

shawniverson commented 2 years ago

I apologize, mailx can't handle the encoding of the message body. I fed a copy through my system using raw smtp protocol instead and the HTML disarmed successfully, but I'm not on the same versions of things. I'm going to update and see what happens.

shawniverson commented 2 years ago

@RobinR1 I am having trouble reproducing. Instead of mailx, can you try using msmtp instead using the syntax below?

msmtp --host=your_mailscanner_server -f notification@hoplr.com yourself@example.com < message_file

RobinR1 commented 2 years ago

With msmtp I can indeed reproduce the problem. But MailScanner --deubg does not seem to give me much more information:

2022-01-25T17:27:36.579012+01:00 hachiman postfix/cleanup[5407]: 7ADF988C3: milter-discard: END-OF-MESSAGE from amaterasu.sicho.home[192.168.0.1]: milter triggers DISCARD action; from=<notification@hoplr.com> to=<robin@sicho.home> proto=ESMTP helo=<localhost>
2022-01-25T17:27:36.582183+01:00 hachiman postfix/smtpd[2485]: disconnect from amaterasu.sicho.home[192.168.0.1] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
2022-01-25T17:27:37.023029+01:00 hachiman MailScanner[3146]: New Batch: Scanning 1 messages, 63417 bytes
2022-01-25T17:27:37.110922+01:00 hachiman MailScanner[3146]: Virus and Content Scanning: Starting
2022-01-25T17:27:37.346882+01:00 hachiman MailScanner[3146]: MailWatch: Allowlist refresh time reached
2022-01-25T17:27:37.347507+01:00 hachiman MailScanner[3146]: MailWatch: Starting up MailWatch SQL Allowlist
2022-01-25T17:27:37.355906+01:00 hachiman MailScanner[3146]: MailWatch: Read 1 allowlist entries
2022-01-25T17:27:37.356577+01:00 hachiman MailScanner[3146]: MailWatch: Blocklist refresh time reached
2022-01-25T17:27:37.357091+01:00 hachiman MailScanner[3146]: MailWatch: Starting up MailWatch SQL Blocklist
2022-01-25T17:27:37.364578+01:00 hachiman MailScanner[3146]: MailWatch: Read 0 blocklist entries
2022-01-25T17:29:01.011410+01:00 hachiman MailScanner[3146]: HTML disarming died, status = 1
2022-01-25T17:29:01.026939+01:00 hachiman MailScanner[3146]: Content Checks: Detected and have disarmed denialofservice tags in HTML message in 7ADF988C3.A1AAD from notification@hoplr.com
2022-01-25T17:29:01.027099+01:00 hachiman MailScanner[3146]: Quarantined message 7ADF988C3.A1AAD as it caused MailScanner to crash several times
2022-01-25T17:29:01.027173+01:00 hachiman MailScanner[3146]: Saved entire message to /var/spool/MailScanner/quarantine/20220125/7ADF988C3.A1AAD
2022-01-25T17:29:01.223003+01:00 hachiman postfix/pickup[3503]: 363D788D7: uid=51 from=<>
2022-01-25T17:29:01.225020+01:00 hachiman postfix/cleanup[5407]: 363D788D7: message-id=<20220125162901.363D788D7@hachiman.sicho.home>
2022-01-25T17:29:01.231233+01:00 hachiman MailScanner[3146]: Sender Warnings: Delivered 1 warnings to virus senders
2022-01-25T17:29:01.358127+01:00 hachiman postfix/qmgr[1318]: 363D788D7: from=<>, size=1055, nrcpt=1 (queue active)
2022-01-25T17:29:01.359873+01:00 hachiman postfix/pickup[3503]: 57AA988D9: uid=51 from=<postmaster>
2022-01-25T17:29:01.367978+01:00 hachiman postfix/cleanup[5407]: 57AA988D9: message-id=<20220125162901.57AA988D9@hachiman.sicho.home>
2022-01-25T17:29:01.485860+01:00 hachiman MailScanner[3146]: Notices: Warned about 1 messages
2022-01-25T17:29:01.661537+01:00 hachiman postfix/qmgr[1318]: 57AA988D9: from=<postmaster@hachiman.sicho.home>, size=3884, nrcpt=1 (queue active)
2022-01-25T17:29:01.939412+01:00 hachiman postfix/smtp[5478]: 363D788D7: to=<notification@hoplr.com>, relay=uit.telenet.be[195.130.132.11]:25, delay=0.82, delays=0.24/0.07/0.27/0.24, dsn=2.0.0, status=sent (250 2.0.0 Message accepted for delivery (n4V12600j4LnZik014V1rp))
2022-01-25T17:29:01.940571+01:00 hachiman postfix/qmgr[1318]: 363D788D7: removed
2022-01-25T17:29:02.495269+01:00 hachiman MailScanner[3146]: Deleted 1 messages from processing-database
2022-01-25T17:29:02.495434+01:00 hachiman MailScanner[3146]: MailWatch: Logging message 7ADF988C3.A1AAD to SQL
2022-01-25T17:29:02.625782+01:00 hachiman MailScanner[3283]: MailWatch: 7ADF988C3.A1AAD: Logged to MailWatch SQL
2022-01-25T17:29:02.695879+01:00 hachiman dovecot: lda(robin)<5480><y0G0Kk0l8GFoFQAA9tQm2w>: sieve: msgid=<20220125162901.57AA988D9@hachiman.sicho.home>: stored mail into mailbox 'INBOX'
2022-01-25T17:29:02.701857+01:00 hachiman postfix/local[5479]: 57AA988D9: to=<robin@hachiman.sicho.home>, orig_to=<postmaster>, relay=local, delay=1.4, delays=0.41/0.01/0/1, dsn=2.0.0, status=sent (delivered to command: /usr/lib/dovecot/deliver)
2022-01-25T17:29:02.707381+01:00 hachiman postfix/qmgr[1318]: 57AA988D9: removed
2022-01-25T17:29:21.305659+01:00 hachiman root[5537]: Stopping MailScanner

shawniverson commented 2 years ago

@RobinR1 I will spin up an OpenSuse instance and try as well. If necessary, I will add some debug output to HTML disarming to get some more visibility and have you run the message through again.

shawniverson commented 2 years ago

Appears to be taking longer than usual to disarm the html in this message on OpenSuse. If it takes too long, the alarm is triggered and the process fails.

shawniverson commented 2 years ago

Removed earlier comments, weren't valid...

Timing is the same between different OSes on same message. This still appears to be a potential timeout.

shawniverson commented 2 years ago

@RobinR1 This is a timeout issue occurring and not a bug. There are a large number of link tags in this particular email, causing excessive scanning time. Here are your options to resolve:

Favorable options:

Create a ruleset for Dangerous Content Scanning and set the affected source (hoplr in this case) to no. This will exempt this source from this HTML disarm scan and save CPU cycles

Example /etc/MailScanner/MailScanner.conf or conf.d options

Dangerous Content Scanning = %rules-dir%/content.scanning.rules

%rules-dir%content.scanning.rules (tab separated)

From:    notification@hoplr.com    no
FromOrTo:    default    yes

Utilize faster resources on your system to speed up HTML Disarming (My system takes about 30 seconds to scan this particular sample due to all the link tags)

Less favorable options:

Increase the SpamAssassin Timeout setting (although this is a SpamAssassin setting, HTML Disarm children use it as well)
Set the Ignore Denial Of Service setting to yes (you can create a ruleset here and add hoplr to it, this however will mean your system will burn CPU cycles still)

RobinR1 commented 2 years ago

It is quite an old system with an AMD Athlon II X4 600e CPU (2.2Ghz) and the mail system (postfix, dovecot, mailscanner) is running inside of a KVM VM. I have no immediate possibilities to replace the system hw with faster hw.

The disarming of the mail takes about 80s between "New Batch: Scanning 1 message" and "died, with status = 1". I tried adding another CPU, but the process seems to be single threaded and this didn't change anything on the behavior. I increased the SpamAssassin Timeout to 120 and now the mail indeed gets correctly disarmed. Oddly however, in the few tests I did now, the scan now takes about 74s to 79s.. between "New Batch: Scanning 1 message" and "Content Checks: Detected and have disarmed web bug tags..."

I could indeed add hoplr to the content.scanning.rules but it's not that I trust them that much, so I would rather still have it disarmed. So I've now set the timeout of spamassassin to 85 which seems to be enough for those mails to be disarmed correctly and not too far away from the default 75s. I may still have to increment it a bit if I notice that it still fails sometimes, but for now that should do. I'm less concerned about the cpu cycles those mails use than about the actual content of those mails. It's not that those mails come in so frequently that I should worry about the extra cpu cycles they generate.

Previous version of MailScanner I used never had a problem with it (v4.84.5), however I ran that one straight on the hypervisor OS itself (Debian 8), so that could also be the reason why it previously didn't have problems with those mails.

Anyway, many thanks for debugging this problem for me!

But maybe it would be better if MailScanner would actually mention in the log that the process was killed due to a timeout instead of the current 'MailScanner crashing' message ? And maybe also add to the comments of the MailScanner.conf file that the SpamAssassing Timeout setting is also used by the HTML disarming process ? If I would have known that HTML disarming timed out instead of 'crashing' then I would probably have searched in the config file for a timeout setting and would then have found that comment, and fix this 'problem' myself.. :-)

shawniverson commented 2 years ago

@RobinR1 Those are great suggestions. I will work on those.

ez-w commented 2 years ago

Hi, for your information :

same problem, 5 times (2 different sender) in one month after update to MailScanner 5.4.3-1 , messages contain only HTML code (no image or attachment, 48 link http.., size 54 ko)

CentOS 7 uptodate 2x Intel Xeon 2.2 Ghz sendmail 8.14.7

Test Spamassassin Timeout 360 seconds and same problem, durant 360 secondes one CPU 100%

and after

Report: MailScanner: Message attempted to kill MailScanner

P.S One sender who was impacted send every week a mailing-list to my customers, today first time this problem appears.

shawniverson commented 2 years ago

@ez-w I will run a regression test on performance and see if anything stands out. 48 links is a lot, though.

ez-w commented 2 years ago

@ez-w I will run a regression test on performance and see if anything stands out. 48 links is a lot, though.

17 links for telegram, twitter, facebook,... and 4 newspapers articles and the rest links from the sender of newsletter (links to image, list unsubscribe,contact form,...) nothing special for a newsletter

Newsletter from another sender : 95 links, no problem to MailScanner analyze

raffi4 commented 2 years ago

Dangerous Content Scanning = %rules-dir%/content.scanning.rules

%rules-dir%content.scanning.rules (tab separated)
From: notification@hoplr.com no FromOrTo: default yes

Hi i am searching to deactivate messages (my) to beeing disarmed by Maiscanner but without success on the configuration index i found this which make me lost to allow a mail like : mygoodmail.com

Here is the meaning of the values: yes => Allow these tags to be in the message no => Ban messages containing these tags disarm => Allow these tags, but stop these tags from working

Thanks for your help

MailScanner / v5

Message attempted to kill MailScanner every time when receiving a specific (html) mail #590