cyrusimap / cyrus-imapd

Cyrus IMAP is an email, contacts and calendar server
http://cyrusimap.org
Other
549 stars 150 forks source link

cyr_virusscan: remove excess output - header lines #1839

Closed postilion closed 7 years ago

postilion commented 7 years ago

Having finally gotten around to trying cyr_virusscan, it's pretty cool, but the output can be a little maddening, as it prints headers for every mailbox visited, regardless if there's any infected emails within.

Here's an example:

su cyrus -c "/usr/lib/cyrus/bin/cyr_virusscan "

Using ClamAV virus scanner
Loaded 5789330 virus signatures.

Mailbox Name                                   Msg UID  Status  Virus Name
----------------------------------------    ----------  ------  --------------------------------------------------
user.betty                                      185395    READ  Heuristics.Phishing.Email.SpoofedDomain

Mailbox Name                                   Msg UID  Status  Virus Name
----------------------------------------    ----------  ------  --------------------------------------------------

Mailbox Name                                   Msg UID  Status  Virus Name
----------------------------------------    ----------  ------  --------------------------------------------------

Mailbox Name                                   Msg UID  Status  Virus Name
----------------------------------------    ----------  ------  --------------------------------------------------

Any chance we can eliminate all the excess output?

elliefm commented 7 years ago

That seems like it'll be pretty easy to clean up, I'll have a look at it

elliefm commented 7 years ago

How's this? https://github.com/cyrusimap/cyrus-imapd/compare/master...elliefm:v30/1839-cyr_virusscan-headers

It's trying to make it so it only prints the headers once for each distinct user who has infections in any of their mailboxes. It won't print them at all if there are no infections (maybe it should?). It seems to work okay in my testing, but I don't have a setup with multiple different mailboxes/infections so it hasn't been exercised thoroughly. How does it work for you?

We should probably also print a summary at the end (so that "123 mailboxes scanned, 0 infections found" is clearly distinct from "it quietly failed for some reason") -- will add that next, once the headers are sorted out.

postilion commented 7 years ago

@elliefm, I'll look at this tomorrow. I've got a pretty good pile o' mail to test with.

elliefm commented 7 years ago

Oh also, note that the headers are per user with infections, not per mailbox with infections -- so if Betty has some infected messages in both INBOX and Trash (say), the output will be like:

HEADING HEADING HEADING
user.betty ... some virus
user.betty.Trash ... some virus

which seems a lot nicer than the per-mailbox style:

HEADING HEADING HEADING
user.betty ... some virus

HEADING HEADING HEADING
user.betty.Trash ... some virus
elliefm commented 7 years ago

I got the bit between my teeth, so I've added a couple more commits to that branch after all:

I'd appreciate your feedback! :)

hagedose commented 7 years ago

I just tried your patches, and they're looking good to me.

I wonder if it makes sense to print more information regarding infected mails, perhaps only in verbose mode. I think it might be helpful to print the From:-address and the Subject. The rationale is that I am not sure if we want to or are even allowed to modify user's mailboxes once we have delivered the mails. So a policy we might want to adopt would be to notify users of problematic messages in their mail store. With just the UID, normal users would have no way of identifying the messages in question.

Perhaps it would then make sense to have "-n" work without "-r" as well?

postilion commented 7 years ago

@elliefm looks great to me. @hagedose While I understand where you're coming from with the "-n work without -r" idea, I worry that the printing of such information via STDOUT would be hard to read. What might work better would be to allow an optional argument to -n, which would be the mailbox spec to which the message digests should be appended.
cyr_virusscan -r -n user/admin

Problem is, how would one distinguish the mboxpattern for -n deliveries from the mboxpattern to scan? Hmm...

This could be extended, as well, to be used as the place to append digests for shared mailboxes, in regards to which there is currently this caveat in sources:

what to do with public mailboxes (bboards)?

We will have to address that at some point.

Cheers, -nic

postilion commented 7 years ago

@elliefm I tried the following on a shared mailbox, and get a segfault:

/usr/lib/cyrus/bin/cyr_virusscan -v -n -r tech/support
...
mmap(NULL, 49152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3b08c77000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3b08e61000
mprotect(0x7f3b08c77000, 49152, PROT_READ|PROT_EXEC) = 0
mprotect(0x7f3b08e61000, 12288, PROT_READ|PROT_EXEC) = 0
open("/var/lib/imap/mailboxes.db", O_RDWR) = 5
fstat(5, {st_mode=S_IFREG|0600, st_size=70008, ...}) = 0
mmap(NULL, 81920, PROT_READ, MAP_SHARED, 5, 0) = 0x7f3adeba3000
fcntl(5, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
fstat(5, {st_mode=S_IFREG|0600, st_size=70008, ...}) = 0
stat("/var/lib/imap/mailboxes.db", {st_mode=S_IFREG|0600, st_size=70008, ...}) = 0
fcntl(5, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
open("/var/lib/imap/quota.db", O_RDWR)  = 6
fstat(6, {st_mode=S_IFREG|0600, st_size=15488, ...}) = 0
mmap(NULL, 24576, PROT_READ, MAP_SHARED, 6, 0) = 0x7f3b08e5b000
fcntl(6, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
fstat(6, {st_mode=S_IFREG|0600, st_size=15488, ...}) = 0
stat("/var/lib/imap/quota.db", {st_mode=S_IFREG|0600, st_size=15488, ...}) = 0
fcntl(6, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
fcntl(5, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
fstat(5, {st_mode=S_IFREG|0600, st_size=70008, ...}) = 0
stat("/var/lib/imap/mailboxes.db", {st_mode=S_IFREG|0600, st_size=70008, ...}) = 0
fcntl(5, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

I don't have gdb on that VM, so just include strace output. I'll see if I can get something more useful.

postilion commented 7 years ago

@elliefm As a quick follow up, this segfault does not occur with the unpatched version.

elliefm commented 7 years ago

@hagedose - I think ideally you'd be running your AV solution out the front, before the messages even hit Cyrus (so in postfix or sendmail or exim or whatever you're using). That way you can bounce/reject/quarantine infected messages instead of delivering them.

My understanding of cyr_virusscan is that it's intended as a secondary line of defence, for retroactively detecting new viruses that may have snuck through your front line before your AV had signatures available.

It might actually be worth documenting this somewhere...

elliefm commented 7 years ago

@postilion ooh I'll add a shared mailbox to my test case and see if I can reproduce that, thanks :)

elliefm commented 7 years ago

@postilion of course, shared mailboxes don't have a userid, so they can't be grouped by userid! hmm...

elliefm commented 7 years ago

Fixed in b24b401, pushed to the same branch

@postilion I don't have tests for the -n functionality yet -- I see you're trying it, how's it working out in your experience?

postilion commented 7 years ago

@elliefm It works well. I used an actual sample in the examples section of the man page. I don't have access to my test server from here (home) but tomorrow I can send you a larger sample.

hagedose commented 7 years ago

@elliefm – your analysis is correct, and we have been running AV on our mail gateways for a very long time. But with the recent bout of ransomware (Locky etc.) there has been demand for that secondary line of defense you mentioned.

FWIW: one of the faculties on our campus runs their own Exchange server with Sophos PureMessage. That combo allows them to check their mail spool periodically and "repair" infected messages in place. To repair a message means to remove phishing URLs or infected attachments. I think they also add explanatory text to the message, but I'm not sure.

Since IMAP messages are immutable it isn't possible to just modify the file on disk. If Cyrus were to replicate that behavior, the original message would have to be deleted and the modified message appended to the mailbox. Obviously that's not trivial to implement, and I'm not sure if ClamAV even provides the necessary information to determine what part of a message needs to be "repaired".

hagedose commented 7 years ago

Thinking about the notifications, how difficult would it be to make them localized? For our purposes it would be enough if we could supply our own template with German and English text.

Does that sound useful to anyone else?

postilion commented 7 years ago

@elliefm Your latest patch fixes the shared mailbox problem. Thanks!

elliefm commented 7 years ago

@postilion Great, the headers patches are now upstream :)

@hagedose It seems plausible, there might even be some templating code already (perhaps in CalDAV?) that could be reused. Do you want to open a feature request for it?