Cisco-Talos / clamav

ClamAV - Documentation is here: https://docs.clamav.net
https://www.clamav.net/
GNU General Public License v2.0
4.28k stars 693 forks source link

Scanning files bigger then 2GB #344

Open kwon-kihong opened 2 years ago

kwon-kihong commented 2 years ago

Describe the bug

The file with problem has over 2GB of data always scan skipped

Data scanned: 0.00 MB

ClamAV Version

0.103.3

How to reproduce the problem

dd if=/dev/zero of=2G.test bs=1 count=0 seek=2G
clamscan -v -a --stdout -d /app/clamav_defs ./2G.test --max-filesize 4000M --max-scansize 4000M

LibClamAV Warning: **************************************************
LibClamAV Warning: ***  The virus database is older than 7 days!  ***
LibClamAV Warning: ***   Please update it as soon as possible.    ***
LibClamAV Warning: **************************************************
Scanning /app/2G.test
/app/2G.test: OK

----------- SCAN SUMMARY -----------
Known viruses: 8543862
Engine version: 0.103.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 2048.00 MB (ratio 0.00:1)
Time: 22.835 sec (0 m 22 s)
Start Date: 2021:10:28 03:16:06
End Date:   2021:10:28 03:16:29
micahsnyder commented 2 years ago

This is a limitation documented in the clamd sample config: https://github.com/Cisco-Talos/clamav/blob/main/etc/clamd.conf.sample#L536 and the man-page for the clamd.conf: https://github.com/Cisco-Talos/clamav/blob/main/docs/man/clamd.conf.5.in#L552

We should probably mention this limitation on in the online docs as well https://docs.clamav.net/

We most recently discussed this issue in the mailing list, here: https://lists.clamav.net/pipermail/clamav-users/2021-April/011018.html

This is probably a key point:

There’s a lot of technical work to be done to safely raise the 2GB file size limitation, as large files of various file types types have never been tested. A large TAR, for example, may well work fine when a large ZIP might crash the program. We really have no idea. Basically it’s going to take a bunch of testing when someone goes to work on this.

If you want to know if the scan limits have been exceeded, you can set AlertExceedsMax yes in clamd.conf, and watch for alerts that start with "Heuristics.Limits.Exceeded".

jhawkins1 commented 2 years ago

We are using CLAMAV and have a requirement for handling files larger than 4GB (MPEGs, ISOs, ZIP, etc...). We are looking at potentially making the required changes, and associated testing, to CLAMAV. Assuming the Project contribution guidelines are followed, would the project accept patches to support this new functionality and include into the mainline? Is anyone already working on this that we could assist or collaborate ?

sviscapi commented 2 years ago

Dear all,

@jhawkins1 : that would really be appreciated :) We also have a use case involving large files (up to 50GB) for long term preservation:

https://www.programmevitam.fr/pages/english/pres_english/ https://github.com/ProgrammeVitam/vitam

So far I've been told ClamAV won't currently handle files larger than 4GB. Is that correct ?

Best regards,

Samuel, for Conseil Départemental de l'Hérault https://herault.fr/

micahsnyder commented 2 years ago

No one is working on this right now.

If you take on this task, you'll have to verify that the parsers for all supported file types can handle files larger than 2GB. A part of that will be making sure that size fields and offset use size_t or uint64_t instead of poorly defined types like int, unsigned, unsigned int, off_t, long, etc. If you make any conversions from a signed type, like int, long, or off_t to an unsigned type like size_t or uint64_t then you will have to make sure that bounds-checking for that variable is still safe. It's a lot of code and a lot of work beyond simply removing the hardcoded 2GB limit and increasing the max size allowed by the option parser.

But if your team has time to work on this, I would be happy to review your work when you have a pull request ready.

@sviscapi Currently ClamAV won't handle files larger than 2GB.

jhawkins1 commented 2 years ago

@micahsnyder, thanks, I will give you an update after we review and discuss internally.

@sviscapi @micahsnyder , to achieve the 4GB file scanning you have to be smart with how you do the scanning otherwise you run into the sizing and processing nuances as described in the thread and referenced by @micahsnyder . For smaller files you can use "clamdscan" scanning forwarder command which scans via the clamd daemon. For larger files up to 4GB, use the "clamscan" command line scanner. In our scanning script we perform a size check on the target file to be scanned. If uncompressed size is < 1GB we use "clamdscan" , else if size < 4GB we use clamscan with the parameters max-filesize and max-scansize set to 4000M. There are nuances of the various files types you may need to consider with this, but thus far, this has worked for us for the file types we handle. Here is an example "clamscan" based command line for upto 4GB:

/usr/bin/clamscan --max-filesize=4000M --max-scansize=4000M --detect-pua=yes --heuristic-scan-precedence=yes --alert-exceeds-max=yes --alert-encrypted=yes --pcre-max-filesize=4000M --max-scantime=0 --quiet --log targetfile.scanlog targetfile

dro-w commented 1 year ago

I briefly had a look at this issue and, as @micahsnyder said, it is a non-trivial change to a significant amount of code. @jhawkins1 , did your team (or anyone else for that matter) get a chance to take a look at this?