benzino77 / clamav-rest-api

ClamAV REST API. Scan files using simple POST request.
MIT License
105 stars 37 forks source link

Allow application retry before application failure #23

Closed galbantow closed 2 years ago

galbantow commented 2 years ago

The problem: The API has a design flaw which causes issues when running with the Azure web app instance under docker compose.

  1. The Azure Web application comes online
  2. Azure/Docker attempts to bring both the Clam AV scanner and API container online
  3. The API container attempts to reach out and communicate with the Clam AV Scanner and fails with an error where the port/service is not yet online.
  4. Azure see's that the container has gone offline and kills the entire web app including both containers.
  5. When the containers go offline, the docker compose restart rule kicks in and attempts to bring the containers back online.. Which causes the process to start from step 1 (infinite loop)

The Solution: I have implemented a retry loop in that allows the application to attempt to restart up to X amount of times without notifying azure that there is a problem. In addition, I have created a new variable STARTUP_RETRY that allows us to specify the amount of times the API container will retry before failure.

REPRODUCE This issue can be easily reproduced on a local development environment using the following docker compose. You will note that the app crashes and restarts numerous times while clam is getting ready.

version: '3.3'
services:
  clamav-rest:
    image: benzino77/clamav-rest-api
    depends_on:
      - clamav-server
    links:
      - clamav-server
    environment:
      APP_PORT: 8080
      APP_FORM_KEY: file
      CLAMD_IP: clamav-server
    restart: always
  clamav-server:
    image: clamav/clamav

BEFORE image

AFTER image

benzino77 commented 2 years ago

I will take a look at this when I finish my winter vacation.

galbantow commented 2 years ago

I will take a look at this when I finish my winter vacation.

Enjoy your vacation :)

benzino77 commented 2 years ago

This is a solid workaround for the problem, but I don't think it is docker style approach... if you know what i mean ;) How about this: Inside the Dockerfile we can add additional package called wait-for-it. This is tool recommended by Docker, which helps you in situations you described. Additionally we have to change Dockerfile and replace ENTRYPOINT ["npm", "start"] with CMD ["npm", "start"]

In such a way we can have full backward compatibility (one could start CRA container as before), but CMD can be overwritten by command in docker-compose.yaml file:

version: '3.8'
services:
  clamd:
    image: clamav/clamav:0.104
    restart: unless-stopped
    networks:
      - clam-net
  api:
    image: clamav-rest-api
    restart: unless-stopped
    command: ['/usr/bin/wait-for-it', '-h', 'clamd', '-p', '3310', '-s', '-t', '30', '--', 'npm', 'start']
    environment:
      - NODE_ENV=production
      - CLAMD_IP=clamd
      - APP_FORM_KEY=FILES
      - APP_PORT=3000
    ports:
      - '8080:3000'
    networks:
      - clam-net
networks:
  clam-net:

As Microsoft docs states here command is a supported option while depends_on is ignored in docker-compose.yaml file.

I haven't tested it on Azure but starting this stack locally gives me this output:

❯ docker-compose up
[+] Running 3/3
 ⠿ Network temp_clam-net   Created                                                                                                                     0.0s
 ⠿ Container temp-clamd-1  Created                                                                                                                     0.1s
 ⠿ Container temp-api-1    Created                                                                                                                     0.1s
Attaching to temp-api-1, temp-clamd-1
temp-api-1    | wait-for-it: waiting 30 seconds for clamd:3310
temp-clamd-1  | Starting ClamAV
Socket for clamd not found yet, retrying (12/1800) ...Sat Feb 19 18:54:39 2022 -> Limits: Global time limit set to 120000 milliseconds.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: Global size limit set to 104857600 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: File size limit set to 26214400 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: Recursion level limit set to 17.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: Files limit set to 10000.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxEmbeddedPE limit set to 10485760 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxHTMLNormalize limit set to 10485760 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxHTMLNoTags limit set to 2097152 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxScriptNormalize limit set to 5242880 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxZipTypeRcg limit set to 1048576 bytes.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxPartitions limit set to 50.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxIconsPE limit set to 100.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: MaxRecHWP3 limit set to 16.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: PCREMatchLimit limit set to 100000.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: PCRERecMatchLimit limit set to 2000.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Limits: PCREMaxFileSize limit set to 26214400.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Archive support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> AlertExceedsMax heuristic detection disabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Heuristic alerts enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Portable Executable support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> ELF support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Mail files support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> OLE2 support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> PDF support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> SWF support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> HTML support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> XMLDOCS support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> HWP3 support enabled.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Self checking every 600 seconds.
temp-clamd-1  | Sat Feb 19 18:54:39 2022 -> Set stacksize to 1048576
temp-clamd-1  | socket found, clamd started.
temp-clamd-1  | Starting Freshclamd
temp-clamd-1  | ClamAV update process started at Sat Feb 19 18:54:39 2022
temp-clamd-1  | daily database available for update (local version: 26454, remote version: 26458)
temp-api-1    | wait-for-it: clamd:3310 is available after 14 seconds
temp-api-1    |
temp-api-1    | > clamav-rest-api@1.0.10 start /clamav-rest-api
temp-api-1    | > node src/app.js
temp-api-1    |
temp-api-1    | Server started on PORT: 3000
temp-clamd-1  | Testing database: '/var/lib/clamav/tmp.f7b4f0cf1f/clamav-1393d39a8ada227a57df7f3344fef810.tmp-daily.cld' ...
temp-clamd-1  | Database test passed.
temp-clamd-1  | daily.cld updated (version: 26458, sigs: 1974077, f-level: 90, builder: raynman)
temp-clamd-1  | main.cvd database is up-to-date (version: 62, sigs: 6647427, f-level: 90, builder: sigmgr)
temp-clamd-1  | bytecode.cvd database is up-to-date (version: 333, sigs: 92, f-level: 63, builder: awillia2)
temp-clamd-1  | Clamd successfully notified about the update.
temp-clamd-1  | Sat Feb 19 18:54:45 2022 -> Reading databases from /var/lib/clamav
temp-clamd-1  | Sat Feb 19 18:54:57 2022 -> Database correctly reloaded (8606148 signatures)
temp-clamd-1  | Sat Feb 19 18:54:57 2022 -> Activating the newly loaded database...
temp-api-1    | ::ffff:172.20.0.1 - - [19/Feb/2022:18:55:04 +0000] "GET /api/v1/version HTTP/1.1" 200 85 "-" "PostmanRuntime/7.29.0"
temp-clamd-1  | Sat Feb 19 18:55:08 2022 -> instream(172.20.0.3@49792): Win.Test.EICAR_HDB-1 FOUND
temp-api-1    | ::ffff:172.20.0.1 - - [19/Feb/2022:18:55:09 +0000] "POST /api/v1/scan HTTP/1.1" 200 229 "-" "PostmanRuntime/7.29.0"
temp-clamd-1  | Sat Feb 19 19:05:09 2022 -> SelfCheck: Database status OK.

Of course there has to be examples updated to show this solution for someone facing the same problem.

PS. I'm a little bit purist in terms of applying "quick fixes" or "workarounds"... I've seen to many of them ;)

galbantow commented 2 years ago

No problem, thanks for taking the time to investigate and for giving a work around. Feel free to close this PR if you feel it's not quite appropriate. Cheers, Gavin

benzino77 commented 2 years ago

Gavin, could you please check temporary new docker image called: benzino77/clamav-rest-api:development I've just pushed to repository, and tell me whether it solves your issue on Azure or not?

The docker-compose.yml you could use should be similar to this one:

version: '3.8'
services:
  clamd:
    image: clamav/clamav:0.104
    restart: unless-stopped
    networks:
      - clam-net
  api:
    image: benzino77/clamav-rest-api:development
    restart: unless-stopped
    # depends_on is ignored in some situations (have a look at the discussion in this PR: https://github.com/benzino77/clamav-rest-api/pull/23)
    # to fix such situation there is wait-for-it script available inside the CRA docker image
    # so to wait for clamd to be available, one could ovewrite the CMD with wait-for-it script
    # UNCOMMENT following line to check if clamav is available on host clamd and port 3310, set timeout to 60 seconds
    command: ['/usr/bin/wait-for-it', '-h', 'clamd', '-p', '3310', '-s', '-t', '60', '--', 'npm', 'start']
    # depends_on:
    #  - clamd
    environment:
      - NODE_ENV=production
      - CLAMD_IP=clamd
      - APP_FORM_KEY=FILES
      - APP_PORT=3000
    ports:
      - '8080:3000'
    networks:
      - clam-net
networks:
  clam-net:
galbantow commented 2 years ago

Just tested it and it all looks good to me, Cheers!

benzino77 commented 2 years ago

Great! I will release new version today evening (UTC). Thanks for your engagement!

benzino77 commented 2 years ago

I've just pushed new release 1.0.11, so it can be used by benzino77/clamav-rest-api (latest) or benzino77/clamav-rest-api:1.0.11 in docker-compose.yml file.

ahmedzak7 commented 2 years ago

can you please help me I get same error when i deploy clamav on a GKE cluster and it connects through a squid proxy , this is the dockerfile

FROM node:14.15.4-buster-slim
# Set versions
ENV CLOUD_SDK_VERSION=309.0.0
# Install base packages 
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
RUN apt-get update && \
    apt-get install -y build-essential clamav-daemon clamav-freshclam curl python3 sudo && \
    rm -rf /var/lib/apt/lists/* && \
    mkdir -p /usr/local/gcloud && \
    curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
    tar -C /usr/local/gcloud -xvf google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
    rm google-cloud-sdk-${CLOUD_SDK_VERSION}-linux-x86_64.tar.gz && \
    ln -s /lib /lib64 && \
    gcloud config set core/disable_usage_reporting true && \
    gcloud config set component_manager/disable_update_check true && \
    mkdir -p /home/node/app && \
    chown -R node:node /home/node/app && \
    chmod 777 /var/log/clamav/freshclam.log && \
    chmod 777 /var/lib/clamav && \
    echo "TCPSocket 3310" >> /etc/clamav/clamd.conf && \
    echo "TCPAddr 127.0.0.1" >> /etc/clamav/clamd.conf && \
    echo "User node" >> /etc/clamav/clamd.conf && \
    echo "DatabaseOwner node" >> /etc/clamav/freshclam.conf && \
    echo "HTTPProxyServer squid-proxy.neds.local" >> /etc/clamav/freshclam.conf && \
    echo "HTTPProxyPort 3128"  >> /etc/clamav/freshclam.conf && \
    echo "node ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/node
# Bring in app code
WORKDIR /home/node/app
COPY --chown=node:node . .
# Set up app
RUN npm config set python $(which python3) && \
    npm install
# Run the rest as the node user
USER 1000
CMD ["/bin/bash", "bootstrap.sh"]

and this is the bootstrap.sh

#!/bin/bash
sudo service clamav-freshclam stop && \
sudo freshclam && \
sudo service clamav-freshclam start && \
sudo service clamav-daemon force-reload && \
npm start

The Error : Error: connect ECONNREFUSED 127.0.0.1:3310 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1158:16) I would really appreciate if you help me and thanks in advance

benzino77 commented 2 years ago

I think that changing your bootstrap.sh script to something similar to:

#!/bin/bash
sudo service clamav-freshclam stop && \
sudo freshclam && \
sudo service clamav-freshclam start && \
sudo service clamav-daemon force-reload && \
/usr/bin/wait-for-it -h localhost -p 3310 -s -t 60 -- npm start

should do the trick. I assume that clamavd is running on localhost.