Docker container does not exit when clamd exits

bnutzer commented 6 months ago

Hi,

after switching to clamav 1.3, we have seen "disappearing" clamds -- for a reason yet to be researched. However, we are having a hard time trouble-shooting that issue, as a dying clamav does not result in an instant exit of the container. The container will just become unhealthy.

The container will start up to three "daemons" (freshclam, clamd, milter), and it obviously not a trivial question which of them is central, and which ones are not. Not using milter, I regard a running clamav container with a working clamd as "valid" even if the freshclam process has gone away.

Due to this, I'd prefer something along these lines (yep, this patch misses the case that no daemons are executed at all):

--- clamav/1.3/alpine/scripts/docker-entrypoint.sh
+++ clamav/1.3/alpine/scripts/docker-entrypoint.sh
@@ -61,6 +61,7 @@ else
                        unlink "/tmp/clamd.sock"
                fi
                clamd --foreground &
+               clamdpid=$!
                while [ ! -S "/run/clamav/clamd.sock" ] && [ ! -S "/tmp/clamd.sock" ]; do
                        if [ "${_timeout:=0}" -gt "${CLAMD_STARTUP_TIMEOUT:=1800}" ]; then
                                echo
@@ -80,7 +81,7 @@ else
        fi

        # Wait forever (or until canceled)
-       exec tail -f "/dev/null"
+       wait $clamdpid
 fi

 exit 0

(Unfortunately, the "wait -n" bashism in busybox and thus in alpine seems to be broken to me; otherwise, collecting the (up to) three pids and "wait -n" for all of them would probably be fine. I have created an issue in the busybox bugtracker, but it probably will take some time until a possible fix hits the alpine image).

I'd be happy to provide a pull request on request. In that case: Should the wait statement in the debian edition wait for all three pids? Or should alpine and debian versions of the script be as close as possible? For a possible "no daemon situation", I'd suggest to use "sleep infinity" instead of the crude "tail" call?

Newspaperman57 commented 4 months ago

+1 on this. We've lost the clamd-process a couple of times while reloading the database due to out of memory, resulting in the kernel killing it, but seemingly leaving the container running, resulting in "broken pipe" errors on clients trying to connect, and the log simply showing:

Sun May 12 11:21:14 2024 -> SelfCheck: Database status OK.
Sun May 12 11:31:15 2024 -> SelfCheck: Database status OK.
Sun May 12 11:41:17 2024 -> SelfCheck: Database status OK.
Sun May 12 11:51:18 2024 -> SelfCheck: Database status OK.
Sun May 12 12:01:19 2024 -> SelfCheck: Database status OK.
Sun May 12 12:11:20 2024 -> SelfCheck: Database status OK.
Sun May 12 12:21:22 2024 -> SelfCheck: Database status OK.
Received signal: wake up
ClamAV update process started at Sun May 12 12:23:33 2024
daily database available for update (local version: 27272, remote version: 27273)
Testing database: '/var/lib/clamav/tmp.4e593738ca/clamav-1886123323e735d67f816af8a3bdb7c7.tmp-daily.cld' ...
Database test passed.
daily.cld updated (version: 27273, sigs: 2061131, f-level: 90, builder: raynman)
main.cvd database is up-to-date (version: 62, sigs: 6647427, f-level: 90, builder: sigmgr)
bytecode.cld database is up-to-date (version: 335, sigs: 86, f-level: 90, builder: raynman)
Clamd successfully notified about the update.
Sun May 12 12:23:43 2024 -> Reading databases from /var/lib/clamav

This results in downtime and needing manual intervention to fix.

We're using the clamav-debian image on a ppc64le-based architecture

vienleidl commented 2 months ago

+1 on this. We've lost the clamd-process a couple of times while reloading the database due to out of memory, resulting in the kernel killing it, but seemingly leaving the container running, resulting in "broken pipe" errors on clients trying to connect, and the log simply showing:

I had the same issue https://github.com/Cisco-Talos/clamav/issues/1282, then I did increase from 2 GiB to 4 GiB of RAM.

Cisco-Talos / clamav-docker

Docker container does not exit when clamd exits #45