elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.13k stars 4.91k forks source link

ERROR: get status request failed:failed to get audit status reply: no reply received #33258

Open mdnfiras opened 1 year ago

mdnfiras commented 1 year ago

we are testing auditbeat and we set it up as Daemonset in our GKE cluster. few pods randomly print this error line:

ERROR: get status request failed:failed to get audit status reply: no reply received

As far as i know, it has nothing to do with the nodes: we can restart the pods as much as we want, new pods on the same nodes sometimes work without printing that error, sometimes they print that error even though previous pods on the same nodes were working fine.

Configuration:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: auditbeat
  namespace: monitoring
  labels:
    app: auditbeat
spec:
  selector:
    matchLabels:
      app: auditbeat
  template:
    metadata:
      labels:
        app: auditbeat
    spec:
      tolerations:
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      serviceAccountName: auditbeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      hostPID: true  # Required by auditd module
      dnsPolicy: ClusterFirstWithHostNet
      # initContainer systemctl to unregister systemd-journald as audit process, otherwise we
      # will see the error "failed to set audit PID. An audit process is already running (PID n)"
      initContainers:
      - name: systemctl
        image: centos
        command:
        - /bin/sh
        - "-c"
        - |
          set -e
          systemctl stop systemd-journald-audit.socket
          systemctl mask systemd-journald-audit.socket
          systemctl restart systemd-journald
          set +e
          systemctl status systemd-journald-audit.socket
          systemctl status systemd-journald
        env:
          - name: SYSTEMD_IGNORE_CHROOT
            value: "1"
        securityContext:
          runAsUser: 0
          capabilities:
            add:
              - 'SYS_ADMIN'
        volumeMounts:
          - name: run
            mountPath: /run
      containers:
      - name: auditbeat
        image: docker.elastic.co/beats/auditbeat:7.16.1
        args:
        - "-c"
        - /etc/auditbeat.yml
        - "-e"
        env:
        - name: ELASTICSEARCH_HOST
          valueFrom: ***
        - name: ELASTICSEARCH_PORT
          valueFrom: ***
        - name: ELASTICSEARCH_USERNAME
          valueFrom: ***
        - name: ELASTICSEARCH_PASSWORD
          valueFrom: ***
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: TINI_SUBREAPER
          value: "1"
        securityContext:
          runAsUser: 0
          capabilities:
            add:
              # Capabilities needed for auditd module
              - 'AUDIT_READ'
              - 'AUDIT_WRITE'
              - 'AUDIT_CONTROL'
        volumeMounts:
        - name: config
          mountPath: /etc/auditbeat.yml
          readOnly: true
          subPath: auditbeat.yml
        - name: modules
          mountPath: /usr/share/auditbeat/modules.d
          readOnly: true
        - name: data
          mountPath: /usr/share/auditbeat/data
        - name: bin
          mountPath: /hostfs/bin
          readOnly: true
        - name: sbin
          mountPath: /hostfs/sbin
          readOnly: true
        - name: usrbin
          mountPath: /hostfs/usr/bin
          readOnly: true
        - name: usrsbin
          mountPath: /hostfs/usr/sbin
          readOnly: true
        - name: etc
          mountPath: /hostfs/etc
          readOnly: true
        # Directory with root filesystems of containers executed with containerd, this can be
        # different with other runtimes. This volume is needed to monitor the file integrity
        # of files in containers.
        - name: run-containerd
          mountPath: /run/containerd
          readOnly: true
      volumes:
      - name: bin
        hostPath:
          path: /bin
      - name: usrbin
        hostPath:
          path: /usr/bin
      - name: sbin
        hostPath:
          path: /sbin
      - name: usrsbin
        hostPath:
          path: /usr/sbin
      - name: etc
        hostPath:
          path: /etc
      - name: config
        configMap:
          defaultMode: 0640
          name: auditbeat-config
      - name: modules
        configMap:
          defaultMode: 0640
          name: auditbeat-daemonset-modules
      - name: data
        hostPath:
          # When auditbeat runs as non-root user, this directory needs to be writable by group (g+w).
          path: /var/lib/auditbeat-data
          type: DirectoryOrCreate
      - name: run-containerd
        hostPath:
          path: /run/containerd
          type: DirectoryOrCreate
      # the run volume for the initContainer systemctl
      - name: run
        hostPath:
          path: /run

Startup logs:

2022-10-05T14:40:16.784Z    INFO    instance/beat.go:686    Home path: [/usr/share/auditbeat] Config path: [/usr/share/auditbeat] Data path: [/usr/share/auditbeat/data] Logs path: [/usr/share/auditbeat/logs] Hostfs Path: [/]
2022-10-05T14:40:16.784Z    INFO    instance/beat.go:694    Beat ID: 65f16e32-9784-4c27-a7bd-b318839d4c59
2022-10-05T14:40:16.787Z    WARN    [add_cloud_metadata]    add_cloud_metadata/provider_aws_ec2.go:95   error when check request status for getting IMDSv2 token: http request status 405. No token in the metadata request will be used.
2022-10-05T14:40:16.978Z    INFO    [seccomp]   seccomp/seccomp.go:124  Syscall filter successfully installed
2022-10-05T14:40:16.978Z    INFO    [beat]  instance/beat.go:1040   Beat info   {"system_info": {"beat": {"path": {"config": "/usr/share/auditbeat", "data": "/usr/share/auditbeat/data", "home": "/usr/share/auditbeat", "logs": "/usr/share/auditbeat/logs"}, "type": "auditbeat", "uuid": "65f16e32-9784-4c27-a7bd-b318839d4c59"}}}
2022-10-05T14:40:16.978Z    INFO    [beat]  instance/beat.go:1049   Build info  {"system_info": {"build": {"commit": "d420ccdaf201e32a524632b5da729522e50257ae", "libbeat": "7.16.3", "time": "2022-01-07T00:30:56.000Z", "version": "7.16.3"}}}
2022-10-05T14:40:16.978Z    INFO    [beat]  instance/beat.go:1052   Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,"version":"go1.17.5"}}}
2022-10-05T14:40:17.079Z    INFO    [beat]  instance/beat.go:1056   Host info   {"system_info": {"host": {"architecture":"x86_64","boot_time":"2022-10-05T04:52:39Z","containerized":false,"name":"***","ip":[***],"os":{"type":"linux","family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"***","timezone_offset_sec":0,"id":"***"}}}
2022-10-05T14:40:17.079Z    INFO    [beat]  instance/beat.go:1085   Process info    {"system_info": {"process": ***}}
2022-10-05T14:40:17.178Z    INFO    instance/beat.go:328    Setup Beat: auditbeat; Version: 7.16.3
2022-10-05T14:40:17.178Z    INFO    [index-management]  idxmgmt/std.go:184  Set output.elasticsearch.index to 'auditbeat-7.16.3' as ILM is enabled.
2022-10-05T14:40:17.178Z    INFO    [esclientleg]   eslegclient/connection.go:102   elasticsearch url: http://elasticsearch.monitoring:***
2022-10-05T14:40:17.178Z    INFO    [publisher] pipeline/module.go:113  Beat name: ***
2022-10-05T14:40:17.179Z    INFO    [monitoring]    log/log.go:142  Starting metrics logging every 30s
2022-10-05T14:40:17.179Z    INFO    instance/beat.go:492    auditbeat start running.
2022-10-05T14:40:17.179Z    INFO    [add_cloud_metadata]    add_cloud_metadata/add_cloud_metadata.go:105    add_cloud_metadata: hosting provider type detected as gcp, metadata={***}
2022-10-05T14:40:17.180Z    INFO    [auditd]    auditd/audit_linux.go:107   auditd module is running as euid=0 on kernel=5.4.188+
2022-10-05T14:40:17.277Z    INFO    [auditd]    auditd/audit_linux.go:134   socket_type=unicast will be used.
2022-10-05T14:40:17.277Z    INFO    cfgfile/reload.go:164   Config reloader started
2022-10-05T14:40:17.277Z    INFO    add_kubernetes_metadata/kubernetes.go:72    add_kubernetes_metadata: kubernetes env detected, with version: v1.21.14-gke.2700
2022-10-05T14:40:17.277Z    INFO    [kubernetes]    kubernetes/util.go:122  kubernetes: Using node *** provided in the config   {"libbeat.processor": "add_kubernetes_metadata"}
2022-10-05T14:40:17.278Z    INFO    [auditd]    auditd/audit_linux.go:107   auditd module is running as euid=0 on kernel=5.4.188+
2022-10-05T14:40:17.278Z    INFO    [auditd]    auditd/audit_linux.go:134   socket_type=unicast will be used.
2022-10-05T14:40:17.279Z    INFO    cfgfile/reload.go:224   Loading of config files completed.
2022-10-05T14:40:17.330Z    INFO    [auditd]    auditd/audit_linux.go:279   Deleted 4 pre-existing audit rules.
2022-10-05T14:40:17.330Z    INFO    [auditd]    auditd/audit_linux.go:298   Successfully added 4 of 4 audit rules.
2022-10-05T14:40:17.330Z    INFO    [auditd]    auditd/audit_linux.go:322   audit status from kernel at start   {"audit_status": {"Mask":0,"Enabled":1,"Failure":0,"PID":0,"RateLimit":0,"BacklogLimit":8192,"Lost":0,"Backlog":0,"FeatureBitmap":127,"BacklogWaitTime":0}}
2022-10-05T14:40:17.330Z    INFO    [auditd]    auditd/audit_linux.go:346   Setting kernel backlog wait time to prevent backpressure propagating to the kernel.
2022-10-05T14:40:17.378Z    INFO    [publisher_pipeline_output] pipeline/output.go:143  Connecting to backoff(elasticsearch(http://elasticsearch.monitoring:***))
2022-10-05T14:40:17.379Z    INFO    [publisher] pipeline/retry.go:219   retryer: send unwait signal to consumer
2022-10-05T14:40:17.379Z    INFO    [publisher] pipeline/retry.go:223     done
2022-10-05T14:40:17.585Z    INFO    [esclientleg]   eslegclient/connection.go:282   Attempting to connect to Elasticsearch version 7.16.1
2022-10-05T14:40:17.877Z    INFO    [esclientleg]   eslegclient/connection.go:282   Attempting to connect to Elasticsearch version 7.16.1
2022-10-05T14:40:17.879Z    INFO    [file_integrity]    file_integrity/eventreader_fsnotify.go:99   Started fsnotify watcher    {"file_path": ["/hostfs/bin", "/hostfs/etc", "/hostfs/sbin", "/hostfs/usr/bin", "/hostfs/usr/sbin"], "recursive": true}
2022-10-05T14:40:17.972Z    INFO    [index-management]  idxmgmt/std.go:261  Auto ILM enable success.
2022-10-05T14:40:17.977Z    INFO    [index-management.ilm]  ilm/std.go:170  ILM policy auditbeat exists already.
2022-10-05T14:40:17.977Z    INFO    [index-management]  idxmgmt/std.go:397  Set setup.template.name to '{auditbeat-7.16.3 {now/d}-000001}' as ILM is enabled.
2022-10-05T14:40:17.977Z    INFO    [index-management]  idxmgmt/std.go:402  Set setup.template.pattern to 'auditbeat-7.16.3-*' as ILM is enabled.
2022-10-05T14:40:17.977Z    INFO    [index-management]  idxmgmt/std.go:436  Set settings.index.lifecycle.rollover_alias in template to {auditbeat-7.16.3 {now/d}-000001} as ILM is enabled.
2022-10-05T14:40:17.977Z    INFO    [index-management]  idxmgmt/std.go:440  Set settings.index.lifecycle.name in template to {auditbeat {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
2022-10-05T14:40:17.984Z    INFO    template/load.go:110    Template "auditbeat-7.16.3" already exists and will not be overwritten.
2022-10-05T14:40:17.984Z    INFO    [index-management]  idxmgmt/std.go:297  Loaded index template.
2022-10-05T14:40:17.987Z    INFO    [index-management.ilm]  ilm/std.go:126  Index Alias auditbeat-7.16.3 exists already.
2022-10-05T14:40:18.077Z    INFO    [publisher_pipeline_output] pipeline/output.go:151  Connection to backoff(elasticsearch(http://elasticsearch.monitoring:***)) established

runtime logs, a lot of lines like these, with the error line repeating every few minutes at random interval (2-10 minutes):

...
2022-10-05T15:05:47.193Z    INFO    [monitoring]    log/log.go:184  Non-zero metrics in the last 30s    {"monitoring": ***}
...
2022-10-05T15:07:48.223Z    ERROR   [auditd]    auditd/audit_linux.go:204   get status request failed:failed to get audit status reply: no reply received
...
2022-10-05T15:08:17.377Z    INFO    [monitoring]    log/log.go:184  Non-zero metrics in the last 30s    {"monitoring": ***}
...

For confirmed bugs, please report:

elasticmachine commented 1 year ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

botelastic[bot] commented 10 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

elasticmachine commented 7 months ago

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)