archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Understand the impact of TEMPDIR settings in Clam AV #1384

Open ross-spencer opened 6 years ago

ross-spencer commented 6 years ago

The manual entry for Clamscan discusses a temporary directory, reports from IISH suggest that there is some impact in scanning if there isn't enough room to use this area.

The setting is described as:

       --tempdir=DIRECTORY
              Create temporary files in DIRECTORY. Directory must be writable for the '' user or unprivileged user running clamscan.

We need to have a look at how this is used inside Archivematica and provide guidance on how to work with the setting.

ross-spencer commented 6 years ago

There is various documentation about the TEMPDIR settings.

From clamscan:

   --tempdir=DIRECTORY
          Create temporary files in DIRECTORY. Directory must be writable for the '' user or unprivileged user running clamscan.

   --leave-temps
          Do not remove temporary files.

From clamd.conf:

   TemporaryDirectory STRING
          This option allows you to change the default temporary directory.
          Default: system specific (usually /tmp or /var/tmp).

   LeaveTemporaryFiles BOOL
          Do not remove temporary files (for debugging purpose).
          Default: no

How are they used:

There seems to be only one comment from the creators of ClamAV about how they are used: https://lists.gt.net/clamav/users/33466#33466

From: Tomasz Kojm tkojm@clamav.net, Kevin Lin klin@sourcefire.com Yes, that's right - clamscan dumps the input from stdin into a temporary file and then scans that file (by passing it to libclamav). I see your point now. Currently --tempdir and --leave-temps are only respected by libclamav which is the major player and holds 99% of all file operations. If you want this issue to be fixed please open a bug report at our website.

Limiting Disk Use Previously:

Limiting disk-use used to be controlled by a flag called --max-space={x}mb but this setting no longer exists in the above command options (its removal seems undocumented):

https://serverfault.com/a/134472

What have I tried:

My hypothesis right now is that limiting --max-scansize would be equivalent to --max-space but at the cost of reducing the amount of a file scanned for viruses.

That being said, the --leave-temps option does not consistently work (i.e. I have not made it work at all yet without some sort of error). I will need to do more work to understand more about this.

Documenting in Archivematica:

We need to understand what needs to be documented inside the AM docs. There is a little bit about the specific parts of ClamAV that we have tried to control so far: https://www.archivematica.org/en/docs/archivematica-1.7/admin-manual/installation-setup/customization/antivirus-admin/

We could certainly add a note that additional configuration can be done by consulting the ClamAV manual pages.

If we can get an understanding of the effects of the TEMPDIR and how to limit its usage, we can also be sure to document the storage requirements for any user. I think this is a good way forward, but more work needs to be done.

ross-spencer commented 6 years ago

Hi @lwo I have placed some initial comments here: https://github.com/artefactual/archivematica/issues/1008#issuecomment-399974594. I wanted to clarify for the particular issue we have discussed over the phone, what directory is the antivirus storing the temporary files you are seeing? Further, are there any additional issues surrounding those? e.g. are they not being deleted.

lwo commented 6 years ago

It does so in /tmp on our Ubunrtu 16.04 lts 64-bit environment.

This lead to an out-of-storage-space errror during the anti-virus phase. Hence we increased this to 100GB and had no issues since... but of course changing the TemporaryDirectory to a alternative location is much more convenient. If that works out, the ability to set this location via the ansible playbook and have it documented seems a good way to go.

After a scan operation the temporary file is removed.

ross-spencer commented 5 years ago

IISH are monitoring the temporary directory at present and when it becomes close to full, the services are shut down so the service can be de-cluttered. Transfers that do fail need to be transferred again.

Affects: