alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 14 forks source link

clamav anti-virus is memory intensive #1724

Open edchapman88 opened 4 months ago

edchapman88 commented 4 months ago

:white_check_mark: Checklist

:computer: System information

:no_entry_sign: Describe the problem

During the Dec 2023 DSG we found that participants were having out of memory issues with some VMs. The clamav anti-virus software was using a significant amount of memory (around 25%) on the standard D2 v3 VMs with 8GB memory.

@JimMadge commented that 8GB memory is probably not enough for several users (even for light usage), which we have taken on board.

Raising this as an issue to check that this memory usage by clamav is as expected.

:steam_locomotive: Workarounds or solutions

Use machines with more memory!

JimMadge commented 4 months ago

Would be helpful to figure out which process is causing the problem.

I think there should be two. Theclamonacc on-access daemon, and the clamscan periodic, full-system scan.

craddm commented 4 months ago

It must be related to clamdscan, as clamonacc would not have been running, as per #1722

JimMadge commented 4 months ago

@craddm Can you reproduce this?

If that is true (not 100% certain but that seems likely), clamscan is set to run daily at 01:00 (timer).

Is it running for a very long time? Might want to tighten up the frequency (is daily a DSPT requirement?) and the scan locations (currently /).

craddm commented 4 months ago

During our chat with TRESA yesterday, I seem to remember seeing the actual error reproduced at some point. Could somebody provide a screenshot? @helendduncan @cptanalatriste @dsj976

helendduncan commented 4 months ago

I think we saw the proxy error, I don't remember the memory one @craddm ?

craddm commented 4 months ago
Screenshot 2024-02-01 at 12 32 44

Is this what you meant @craddm

No, that's the issue with the squid proxy. I seem to remember there being another one that up briefly.

I think we saw the proxy error, I don't remember the memory one @craddm ?

I might be misremembering, but I thought at some point someone was logged in to an SRD and seeing some error that mentioned clamav, before moving on to the proxy issue. But again, I might be misremembering. Fundamentally I need to see somewhere the problem is happening, as it doesn't seem to be happening on any SRDs I have up.

craddm commented 4 months ago

Is it running for a very long time? Might want to tighten up the frequency (is daily a DSPT requirement?) and the scan locations (currently /).

The latest DSPT guidance says "Antivirus/anti-malware is kept continually up to date.", but doesn't specify a specific time period beyond continually

craddm commented 4 months ago

Again, I might have totally misremembered any sort of clamav error showing up yesterday, so maybe you can disregard that.

So, I've seen some suggestions that 1-2 Gb for clamav is not that unusual:

https://github.com/Cisco-Talos/clamav/issues/565

Hah, actually scratch that about not seeing it that high on any of my currently running SRDs -

image

edchapman88 commented 4 months ago

That looks familiar, I think it was a clamav-daemon process that I originally spotted causing our problems.

jemrobinson commented 2 months ago

@craddm : Is this fixed in 4.2.0?

craddm commented 2 months ago

No. I've had another look with the SRDs from the current release, and can see memory usage consistent with the report above - approx 2 Gb.

I think the memory usage of clamav-daemon/clamd is not avoidable - it keeps the virus signature database in memory at all times, and so the memory usage just reflects the size of its database. The on-access scanner requires clamd to be running to work, and on-access scanning is a DSPT requirement

This is separate from the daily clamscan run, which doesn't require clamd to be running.

Basically, I don't think there's anything we can do about it.

JimMadge commented 2 months ago

If it is a problem or not necessary, we could stop the daemon.

Otherwise, maybe we should add some advice about suitable VM sizes noting that the memory overhead of essential processes will be ~2-3GB.