k8s setup communication problem - ECONNREFUSED #25

Open Kubes275 opened 2 years ago

Kubes275 commented 2 years ago

Hi, I have a problem with running rest api in k8s cluster based on your examples. Sometimes i get response like this. { "success": false, "data": { "error": { "errno": "ECONNREFUSED", "code": "ECONNREFUSED", "syscall": "connect", "address": "", "port": 3310 } } }

Generally, 50% calls fail. With docker-compose setup it works perfectly fine. Is it a bug? or could you give me some advice how to avoid this?

Thank you Jakub

benzino77 commented 2 years ago

It's probably because there is no (yet) clamavd-service ready to respond to request. Have a look at discussion on this PR #23. Try to define command for a pod in CRA deployment:

        - name: clamav-rest-api
          # Run this image
          image: benzino77/clamav-rest-api
          command: ['/usr/bin/wait-for-it', '-h', '<replace-with-clamavd-service-name>', '-p', '3310', '-s', '-t', '60', '--', 'npm', 'start']

Are you using NetworkPolices in your cluster? Could you please provide your ConfigMap definition? How many replicas of clamavd and CRA do you start in your deployment?

Kubes275 commented 2 years ago

Thank you for your reply,

i m using almost default settings from your repository. clamavd-service is running. I start 1 replica of both, clamavd and CRA. No NetworkPolicies. Here is my ConfigMap.

apiVersion: v1
kind: ConfigMap
  name: cra-configmap
  # property-like keys; each key maps to a simple value
  node-env: 'development'
  clamd-ip: 'clamavd-svc'
  app-form-key: 'FILES'

Almost everything is default. Only custom labels, matchLabels and namespace were added.

benzino77 commented 2 years ago

Try with changed command in CRA deployment... If you have only one replica of clamavd and get 50% failed calls this is a very strange behavior. I would suspect that:

  1. there is something wrong with clamavd itself
  2. there is something wrong with network in your setup

Do you use clamav/clamav:0.104.1 image as an clamavd instance?

benzino77 commented 2 years ago

I have read (but I cant find it again) that 0.104 version of clamavd has a bloking problem when database is updated. It does not accept any connections during update. Try with the 0.104.2, wait for 0.105 to be available on dockerhub or use other clamav docker image based on version 0.103.

Kubes275 commented 2 years ago

we switched our images to 0.105 version of clamav. And problem is still the same. It looks same in local cluster and remote k8s cluster where is this app deployed. We could not tried 0.103 version because it is not available on docker hub nowadays. Do you have some clues what might be wrong with network setup or anything else?

Thank you very much

benzino77 commented 2 years ago

Have you tried the configuration with changed command in CRA deployment? Are you deploying CRA and Clamavd in the same namespace?

If you are using manifests provided with CRA, the service name assign to clamavd is called clamavd-service not clamavd-svc as yours ConfigMap shows.

Kubes275 commented 2 years ago

Manifests were changed, so service name is ok(if not i assume that no request would be succesfull), it is not even clamavd-svc these days:) CRA and clamavd are in the same namespace. No i did not test changed command. If i understand it correctly, that command just delayed start of the CRA when CLAMAV is accessible. When clamav is up and running there should be no reason to fail in 50% of requests. Or am i missing something?

benzino77 commented 2 years ago

Yes, the changed command in CRA deployment should force CRA to wait for Clamavd to be accessible. Here is the full description of wait-for-it command: link

Kubes275 commented 2 years ago

Thank you for link, i'll check it but main issue is why cca 50% call for /version endpoint failed when clamavd is up and running. Do you have any ideas? I'm using /version endpoint as liveness probe in our k8s cluster and those failed requests are causing restarting of the pod. Thank you very much.

benzino77 commented 2 years ago

You are right. You should talk with someone from your DevOps team to help you troubleshoot this problem. On my side I can tell that I have performed some tests. One on my local minikube installation (kubernetes version v1.23.3) and the other one on the remote "full blown cluster" version v1.22.4. On both environments I have no problems at all. I have tested setup with two different clamavd versions (clamav/clamav:0.104.1 and clamav/clamav:stable). I have also tested the with livenessProbe defined for CRA:

apiVersion: apps/v1
kind: Deployment
  # Unique key of the Deployment instance
  name: cra-deployment
    app: cra
  replicas: 1
      app: cra
        # Apply this label to pods and default
        # the Deployment label selector to this value
        app: cra
        - name: clamav-rest-api
          # Run this image
          image: benzino77/clamav-rest-api
            initialDelaySeconds: 5
            periodSeconds: 2
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 1
              scheme: HTTP
              path: /api/v1/version
              port: 3000
            - name: NODE_ENV
                  name: cra-configmap
                  key: node-env
            - name: CLAMD_IP
                  name: cra-configmap
                  key: clamd-ip
            - name: APP_FORM_KEY
                  name: cra-configmap
                  key: app-form-key
            - containerPort: 3000
              protocol: TCP
              name: cra-port

To summarize - I can not unfortunately reproduce your problem. What comes to my mind is that maybe you have some other parts involved in your setup: some kind of service mesh (ie. Istio), specific to your CNI network configuration (like Global Calico NetworkPolicies), etc..

Kubes275 commented 2 years ago

OK, interesting. Thank you for your time, help and quick responses!

mom-douellet commented 1 year ago

I ran into the same issue while playing with a k8s deployment. Seems to be memory related on clamav side, not starting properly even if nothing is shown in logs.

What seems to have solved my issue was allowing more memory to clamav (allowed 2048m) and setting this parameter to reduce clamav footprint.

ConcurrentDatabaseReload no

Hope this helps someone.

Dunae commented 5 months ago

I have the same issue and clam av is not working after a day

Dunae commented 5 months ago

curl -s -XPOST -F FILES=@sample-virus.txt {"success":false,"data":{"error":"connect ECONNREFUSED"}}[

benzino77 commented 5 months ago

Check your logs why clamavd stops working after a day. There is a comment above pointing that assigning more memory to clamavd deployment solves the problem.

Dunae commented 5 months ago

Clamav is not stopping, , nothing in the logs