Cisco-Talos / clamav

ClamAV - Documentation is here: https://docs.clamav.net
https://www.clamav.net/
GNU General Public License v2.0
4.36k stars 700 forks source link

Logical signatures fail to match in a quoted printable email with a soft line break #491

Closed m-lw closed 7 months ago

m-lw commented 2 years ago

Here are some logical signatures with two subsignatures that fail to match a quoted printable email where one of the lines is split with a soft line break ('=' at the end of a line).

Scan quoted-printable.eml with the signatures quoted-printable.ldb from quoted-printable.zip:

clamscan --no-summary --allmatch -d quoted-printable.ldb quoted-printable.eml
/tmp/quoted-printable.eml: Test3.UNOFFICIAL FOUND

It only matches the Test3 rule, but should match Test1 and Test2 as well.

Running it without the soft line break matches all three rules as expected:

perl -0npe '$_=~s/=\n//' quoted-printable.eml | clamscan --no-summary --allmatch -d quoted-printable.ldb -
stdin: Test1.UNOFFICIAL FOUND
stdin: Test2.UNOFFICIAL FOUND
stdin: Test3.UNOFFICIAL FOUND

This is the output of clamconf -n:

Checking configuration files in /usr/local/etc

Config file: clamd.conf
-----------------------
LogFile = "/var/log/clamav/clamd.log"
PidFile = "/var/run/clamav/clamd.pid"
LocalSocket = "/var/run/clamav/clamd.sock"
User = "clamav"
AlertOLE2Macros = "yes"

Config file: freshclam.conf
---------------------------
PidFile = "/var/run/clamav/freshclam.pid"
UpdateLogFile = "/var/log/clamav/freshclam.log"
DatabaseMirror = "database.clamav.net"

Config file: clamav-milter.conf
-------------------------------
PidFile = "/var/run/clamav/clamav-milter.pid"
User = "clamav"
ClamdSocket = "unix:/var/run/clamav/clamd.sock"
MilterSocket = "/var/run/clamav/clmilter.sock"

Software settings
-----------------
Version: 0.104.2
Optional features supported: MEMPOOL AUTOIT_EA06 BZIP2 LIBXML2 PCRE2 ICONV JSON RAR 

Database information
--------------------
Database directory: /var/db/clamav
[3rd Party] xxx1.ldb: 30 sigs
daily.cld: version 26471, sigs: 1975358, built on Fri Mar  4 09:24:47 2022
[3rd Party] xxx2.cdb: 1126 sigs
[3rd Party] xxx3.cdb: 206 sigs
[3rd Party] whitelist.sfp: 1 sig 
main.cvd: version 62, sigs: 6647427, built on Thu Sep 16 13:32:42 2021
bytecode.cvd: version 333, sigs: 92, built on Mon Mar  8 15:21:51 2021
[3rd Party] xxx4.cdb: 60 sigs
Total number of signatures: 8624300

Platform information
--------------------
uname: FreeBSD 12.2-RELEASE-p7 FreeBSD 12.2-RELEASE-p7 GENERIC amd64
OS: FreeBSD, ARCH: amd64, CPU: amd64
zlib version: 1.2.11 (1.2.11), compile flags: a9
platform id: 0x03238e8e0800000000040201

Build information
-----------------
Clang: FreeBSD Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2) (4.2.1)
sizeof(void*) = 8
Engine flevel: 142, dconf: 142
mjbroekman commented 2 years ago

ClamAV is removing the soft-break during normalization. The 'problem' is that it is then also converting everything to lowercase, so "DocuSign" is becoming "docusign", which doesn't match the first logical signature.

Your Test3 signature does match because one of the lines in the original email has 'abcd' unbroken, along with the camelcase DocuSign.

If you add ::i to end of each of the first subsignatures, all the signatures do end up matching because it is no longer looking explicitly for the camelcase.

~/Security$ clamscan -d quoted-printable.ldb --no-summary -z quoted-printable.eml
~/Security/quoted-printable.eml: Test1.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test2.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test3.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test1.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test2.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test3.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test3.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test3.UNOFFICIAL FOUND
~/Security/quoted-printable.eml: Test3.UNOFFICIAL FOUND

~/Security$ cat quoted-printable.ldb  
Test1;Engine:81-255,Target:0;0&1;446f63755369676e::i;616263642e696f2f61626364
Test2;Engine:81-255,Target:0;0&1;446f63755369676e::i;616263642e
Test3;Engine:81-255,Target:0;0&1;446f63755369676e::i;61626364
mjbroekman commented 2 years ago

For reference, these are the files that get created by the different normalizations that occur while scanning that message:

~/Security/20220304_130643-quoted-printable.eml.4cc5473e67/quoted-printable.eml.58567c8191$ for f in `find . -type f`; do echo "== $f =="; cat $f; echo; done
== ./textportion.84662458bb/html-tmp.45d0147f18/notags.html ==
 https://abcd.io/abcd/x#x docusign 
== ./textportion.84662458bb/html-tmp.45d0147f18/nocomment.html ==
<html><head><title></title></head><body><a href="https://abcd.io/abcd/x#x">docusign</a></body></html>
== ./clamav-305ac395b94a192af4df0026248a2883.tmp.0356e26b8c/html-tmp.ff53d1161f/notags.html ==
 https://a bcd.io/abcd/x#x docusign 
== ./clamav-305ac395b94a192af4df0026248a2883.tmp.0356e26b8c/html-tmp.ff53d1161f/nocomment.html ==
<html><head><title></title></head><body><a href="https://a bcd.io/abcd/x#x">docusign</a></body></html>
== ./mail-tmp.759b8bbb97/clamav-305ac395b94a192af4df0026248a2883.tmp ==
<HTML><HEAD><TITLE></TITLE></HEAD>
<body>
<A href="https://a
bcd.io/abcd/x#x">
DocuSign</A>
</BODY></HTML>

== ./mail-tmp.759b8bbb97/clamav-45d2dba2bdf8852ab9fc5250097d7ed7.tmp ==
<HTML><HEAD><TITLE></TITLE></HEAD>
<body>
<A href="https://abcd.io/abcd/x#x">
DocuSign</A>
</BODY></HTML>

There's definitely something odd going on because if you use --normalize=no on the clamscan command-line, all 3 signatures will match correctly. It seems like the original file (or the reconstructed mail file) doesn't get scanned if normalization is allowed.