Open michaelwittig opened 10 months ago
ClamAV normalizes text files and then scans boths versions, so the total amount of data scanned may be significantly higher than the files being scanned. I wouldn't expect a 670 MB text file to end up scanning more than 4GB. That does seem a little strange. Perhaps it is finding some attached content and extracting that and scanning that as well.
I'm not sure I would consider this to be a bug. But if you want to investigate more -- can you attach the output from running clamscan with these additional options: --debug --gen-json
I'm running into a similar issue trying to upgrade ClamAV from the 0.x LTS to 1.x. Note that ClamAV 1.0.5
reports this as MaxScanSize
while ClamAV 1.3.1
flags this as MaxFileSize
and no warning is emitted on 0.103.8
.
The file I'm scanning is an arm64 binary of size 28Mb. Looking at the output of --debug --gen-json
the following looks interesting (snippets, full debug output below):
$ clamscan -d db/daily.cvd --alert-exceeds-max=yes --max-filesize=2048M --max-scansize=0 --max-scantime=0 --max-recursion=40 arm64-binary --debug --gen-json
[...]
// Seems to correctly classify the file as "executable"
LibClamAV debug: ELF: File type: Executable
LibClamAV debug: ELF: Machine type: Unknown (0xb7)
[...]
// The overall file finishes without a finding
LibClamAV debug: Descriptor[3]: Continuing after file scan resulted with: No viruses detected
[...]
// Within the binary, ClamAV seems to detect a RAR-SFX signature?
LibClamAV debug: Matched signature for file type ZIP-SFX at 19076080
LibClamAV debug: Matched signature for file type RAR-SFX at 19076520
LibClamAV debug: Matched signature for file type RAR-SFX at 19076520
LibClamAV debug: Matched signature for file type HTML data at 20077476
LibClamAV debug: Matched signature for file type HTML data
LibClamAV debug: Matched signature for file type HTML data
[...]
unrar_open: Opened archive: /var/folders/ln/vqtgf6r50jj7llpd28yrv60d082fsw/T//20240730_210831-scantemp.fe25d32168/clamav-d4aed806711af70417a1e533c7ea5fc1.tmp
unrar_peek_file_header: Name:
unrar_peek_file_header: Directory?: 0
unrar_peek_file_header: Target Dir: 0
unrar_peek_file_header: RAR Version: 3
unrar_peek_file_header: Packed Size: 8719941959316996884
unrar_peek_file_header: Unpacked Size: 9080236526577124131
// Seems to detect a RAR entry of an insane size
LibClamAV debug: RAR: Next file is too large (9080236526577124131 bytes); it would exceed max scansize. Skipping to next file.
[...]
/Volumes/git/sandbox/clamav-debug/arm64-binary: Heuristics.Limits.Exceeded.MaxFileSize FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Data scanned: 29.86 MB
Data read: 28.04 MB (ratio 1.06:1)
My interpretation is that ClamAV wrongfully identifies the binary as a RAR archive and then reads inaccurate size metadata?
Unfortuantely, I can't share the concrete binary. But happy to dig up more debug information if helpful!
Did some more digging and think this has the same cause as https://github.com/Cisco-Talos/clamav/issues/1143#issuecomment-1894595948.
This is also a compiled golang binary which contains the rar header bytes because go stdlib defines this as string here: https://github.com/golang/go/blob/b44f6378233ada888f0dc79e0ac56def4673d9ed/src/net/http/sniff.go#L183-L190
Hex of the scanned file at the referenced offset:
ClamAV then assumes that this is the beginning of a RAR archive, and tries to read the PACK_SIZE
and UNP_SIZE
RAR headers to get the archive size. However given this is not actually a RAR archive, the locations contains effectively random bytes which result in ClamAV assuming its a 9 PB archive.
unrar_peek_file_header: Name:
unrar_peek_file_header: Directory?: 0
unrar_peek_file_header: Target Dir: 0
unrar_peek_file_header: RAR Version: 3
unrar_peek_file_header: Packed Size: 8719941959316996884
unrar_peek_file_header: Unpacked Size: 9080236526577124131
Can we improve the RAR archive detection here? Not sure in what ways we already do this, but maybe we can check for the presence of the HEAD_TYPE
bytes or even check if the HEAD_CRC
is present?
Currently, any golang binary containing net/http/sniff
(or other static references to the rar header) will likely run into a FP here.
Hi!
I recently received a Heuristics.Limits.Exceeded.MaxScanSize for a file that is much smaller than my MaxScanSize (4294967295) limit using clamd. The file is a 670 MB (more accurately 636631040 bytes) text file. The file has a .txt extension but actually contains a large bash script.
clamscan --debug
(full output).I can scan files larger than 670 MB. Just this one file is special. I saw other issues where the file was matching against signatures but my case looks different (no matching at all).
Unfortunately, I can not share the file. Any ideas what could cause this?
clamconf output: