elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

Filestream metrics might not work correctly in Kubernetes #37925

Open belimawr opened 7 months ago

belimawr commented 7 months ago

There are some situations the Filestream input will have its metrics collection disabled in Kubernetes, that happens when:

This is somehow common when running Filebeat with autodiscover on Kubernetes, the autodiscover code might stop and start the same input, for the same files. This has no effect in the data collection, however due to a bug (https://github.com/elastic/beats/issues/31767) on how we keep track of IDs and start/stop filestream inputs we never clean the ID registry leading to two issues:

  1. The log message

    filestream input with ID 'xyz' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input. is logged even though there is only a single input with ID xyz running

  2. Metrics collection is disabled (side effect from setting metricsID = "").

While the erroneous log message is annoying, disabling metrics collection is a bigger issue that requires attention.

How to reproduce

  1. Create a Kubernetes cluster
  2. Deploy Filebeat using the filebeat-deplyment.yml below
  3. Check the logs for a log message like
    {"log.level":"error","@timestamp":"2024-02-08T12:09:37.932Z","log.logger":"input","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*InputManager).Create","file.name":"input-logfile/manager.go","file.line":183},"message":"filestream input with ID 'filestream-kubernetes-pod-95a3dadcb6d36ee0542391c016d3e4a3e638b110600078600b55961de5682908' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.","service.name":"filebeat","ecs.version":"1.6.0"}                                                                                                                                        
filebeat-deployment.yml

```yaml apiVersion: v1 kind: ServiceAccount metadata: name: filebeat namespace: kube-system labels: k8s-app: filebeat --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: filebeat labels: k8s-app: filebeat rules: - apiGroups: [""] # "" indicates the core API group resources: - namespaces - pods - nodes verbs: - get - watch - list - apiGroups: ["apps"] resources: - replicasets verbs: ["get", "list", "watch"] - apiGroups: ["batch"] resources: - jobs verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: filebeat # should be the namespace where filebeat is running namespace: kube-system labels: k8s-app: filebeat rules: - apiGroups: - coordination.k8s.io resources: - leases verbs: ["get", "create", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: filebeat-kubeadm-config namespace: kube-system labels: k8s-app: filebeat rules: - apiGroups: [""] resources: - configmaps resourceNames: - kubeadm-config verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: filebeat subjects: - kind: ServiceAccount name: filebeat namespace: kube-system roleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: filebeat namespace: kube-system subjects: - kind: ServiceAccount name: filebeat namespace: kube-system roleRef: kind: Role name: filebeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: filebeat-kubeadm-config namespace: kube-system subjects: - kind: ServiceAccount name: filebeat namespace: kube-system roleRef: kind: Role name: filebeat-kubeadm-config apiGroup: rbac.authorization.k8s.io --- apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config namespace: kube-system labels: k8s-app: filebeat data: filebeat.yml: |- filebeat.autodiscover: providers: - type: kubernetes node: ${NODE_NAME} hints.enabled: true hints.default_config: type: filestream prospector.scanner.symlinks: true id: filestream-kubernetes-pod-${data.kubernetes.container.id} take_over: true paths: - /var/log/containers/*${data.kubernetes.container.id}.log parsers: - container: ~ processors: - add_host_metadata: http: enabled: true output.elasticsearch: hosts: ["https://my-cluster.elastic-cloud.com:443"] # ad real credentials port: 443 protocol: "https" username: "elastic" password: "changeme" allow_older_versions: true --- apiVersion: apps/v1 kind: DaemonSet metadata: name: filebeat namespace: kube-system labels: k8s-app: filebeat spec: selector: matchLabels: k8s-app: filebeat template: metadata: labels: k8s-app: filebeat spec: serviceAccountName: filebeat terminationGracePeriodSeconds: 30 hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: filebeat image: docker.elastic.co/beats/filebeat:8.12.1 args: [ "-c", "/etc/filebeat.yml", "-e", ] env: - name: ELASTICSEARCH_HOST value: elasticsearch - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD value: changeme - name: ELASTIC_CLOUD_ID value: - name: ELASTIC_CLOUD_AUTH value: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName securityContext: runAsUser: 0 # If using Red Hat OpenShift uncomment this: #privileged: true resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: data mountPath: /usr/share/filebeat/data - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: varlog mountPath: /var/log readOnly: true volumes: - name: config configMap: defaultMode: 0640 name: filebeat-config - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: varlog hostPath: path: /var/log # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart - name: data hostPath: # When filebeat runs as non-root user, this directory needs to be writable by group (g+w). path: /var/lib/filebeat-data type: DirectoryOrCreate --- ```

I haven't looked enough into that to fully assess how the metrics are working (or not), but when I SSH into the container and run curl localhost:5066/inputs/ | jq I get a valid output with metrics that are changing over time.

  {
    "bytes_processed_total": 125774,
    "events_processed_total": 716,
    "files_active": 1,
    "files_closed_total": 0,
    "files_opened_total": 1,
    "id": "filestream-kubernetes-pod-95a3dadcb6d36ee0542391c016d3e4a3e638b110600078600b55961de5682908",
    "input": "filestream",
    "messages_read_total": 716,
    "processing_errors_total": 0,
    "processing_time": {
      "histogram": {
        "count": 716,
        "max": 2003408353,
        "mean": 1174213257.6648045,
        "median": 1001358532.5,
        "min": 1051078,
        "p75": 1733609597,
        "p95": 1916199464.6,
        "p99": 1982980687.14,
        "p999": 2003408353,
        "stddev": 556202480.3060538
      }
    }
  }

However I haven't managed to access how correct they are. I'll update this issue once I have more information.

elasticmachine commented 7 months ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)