elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
2.59k stars 704 forks source link

Beats: ineffective resource declaration in podTemplate admitted #5512

Open nyarly opened 2 years ago

nyarly commented 2 years ago

Related to #4800

Bug Report

What did you do?

I added a resources block to our filebeats Beat definition.

What did you expect to see?

Those resources to be reflected in the filebeat pods, or the resource to be rejected at deploy time.

What did you see instead? Under which circumstances?

The filebeats daemonset and pods still have resource constraints with very low CPU limits, such that they are throttled constantly.

(We're seeing dropped and duplicated documents on Elasticsearch, and our best hypothesis is that the filebeats aren't keeping up)

Environment

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"f59f5c2fda36e4036b49ec027e556a15456108f0", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9-gke.300", GitCommit:"d94fd1808d7f2b616f611c58f9975f41d48a4662", GitTreeState:"clean", BuildDate:"2022-01-24T09:29:20Z", GoVersion:"go1.16.12b7", Compiler:"gc", Platform:"linux/amd64"}
pebrc commented 2 years ago

Can you share the YAML manifest you are using with the resource limits/requests your are using that does not work?

nyarly commented 2 years ago

Sure:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  annotations:
    association.k8s.elastic.co/es-conf: '{"authSecretName":"filebeat-beat-user","authSecretKey":"management-filebeat-beat-user","caCertProvided":true,"caSecretName":"filebeat-beat-es-ca","url":"https://elasticsearch-es-http.management.svc:9200","version":"7.16.2"}'
    association.k8s.elastic.co/kb-conf: '{"authSecretName":"filebeat-beat-kb-user","authSecretKey":"management-filebeat-beat-kb-user","caCertProvided":true,"caSecretName":"filebeat-beat-kibana-ca","url":"https://elasticsearch-kb-http.management.svc:5601","version":"7.16.2"}'
    common.k8s.elastic.co/controller-version: 1.9.1
  name: filebeat
  namespace: management
spec:
  config:
    filebeat:
      autodiscover:
        providers:
        - hints:
            default_config:
              paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
              type: container
            enabled: true
          node: ${NODE_NAME}
          type: kubernetes
      processors:
      - add_cloud_metadata: {}
      - add_host_metadata: {}
    output:
      elasticsearch:
        pipelines:
        - pipeline: tier0-core-pipeline
          when.equals:
            kubernetes.labels.app: tier0-core
        - pipeline: frontend-pipeline
          when.equals:
            kubernetes.labels.app_kubernetes_io/instance: frontend
    setup:
      kibana:
        path: /logs
  daemonSet:
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        automountServiceAccountToken: true
        containers:
        - env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          name: filebeat
          securityContext:
            runAsUser: 0
          volumeMounts:
          - mountPath: /var/log/containers
            name: varlogcontainers
          - mountPath: /var/log/pods
            name: varlogpods
          - mountPath: /var/lib/docker/containers
            name: varlibdockercontainers
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        resources:
          limits:
            cpu: "1"
            memory: 400Mi
          requests:
            cpu: 100m
            memory: 200Mi
        serviceAccountName: filebeat
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: nvidia.com/gpu
          operator: Equal
          value: present
        volumes:
        - hostPath:
            path: /var/log/containers
          name: varlogcontainers
        - hostPath:
            path: /var/log/pods
          name: varlogpods
        - hostPath:
            path: /var/lib/docker/containers
          name: varlibdockercontainers
    updateStrategy: {}
  elasticsearchRef:
    name: elasticsearch
  image: docker.elastic.co/beats/filebeat:7.16.2
  kibanaRef:
    name: elasticsearch
  type: filebeat
  version: 7.16.2
status:
  availableNodes: 17
  elasticsearchAssociationStatus: Established
  expectedNodes: 17
  health: green
  kibanaAssociationStatus: Established
  version: 7.16.2

The resulting Pods are like

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-03-14T11:31:21Z"
  generateName: filebeat-beat-filebeat-
  labels:
    beat.k8s.elastic.co/config-checksum: 9f1342504e7b9200f24ddf9671d655ebac7c89793d1f5b949534554e
    beat.k8s.elastic.co/name: filebeat
    beat.k8s.elastic.co/version: 7.16.2
    common.k8s.elastic.co/type: beat
    controller-revision-hash: 7f597f9ffd
    pod-template-generation: "3"
  name: filebeat-beat-filebeat-tf4pj
  namespace: management
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: filebeat-beat-filebeat
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - gke-t0-prod-e2-highmem-16-1-5bd03ac8-5ev5
  automountServiceAccountToken: true
  containers:
  - args:
    - -e
    - -c
    - /etc/beat.yml
    env:
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    image: docker.elastic.co/beats/filebeat:7.16.2
    imagePullPolicy: IfNotPresent
    name: filebeat
    resources:
      limits:
        cpu: 100m
        memory: 200Mi
      requests:
        cpu: 100m
        memory: 200Mi
    securityContext:
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/share/filebeat/data
      name: beat-data
    - mountPath: /etc/beat.yml
      name: config
      readOnly: true
      subPath: beat.yml
    - mountPath: /mnt/elastic-internal/elasticsearch-certs
      name: elasticsearch-certs
      readOnly: true
    - mountPath: /mnt/elastic-internal/kibana-certs
      name: kibana-certs
      readOnly: true
    - mountPath: /var/lib/docker/containers
      name: varlibdockercontainers
    - mountPath: /var/log/containers
      name: varlogcontainers
    - mountPath: /var/log/pods
      name: varlogpods
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-pcgvr
      readOnly: true
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  nodeName: gke-t0-prod-e2-highmem-16-1-5bd03ac8-5ev5
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: filebeat
  serviceAccountName: filebeat
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: nvidia.com/gpu
    operator: Equal
    value: present
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/management/filebeat/filebeat-data
      type: DirectoryOrCreate
    name: beat-data
  - name: config
    secret:
      defaultMode: 292
      optional: false
      secretName: filebeat-beat-filebeat-config
  - name: elasticsearch-certs
    secret:
      defaultMode: 420
      optional: false
      secretName: filebeat-beat-es-ca
  - name: kibana-certs
    secret:
      defaultMode: 420
      optional: false
      secretName: filebeat-beat-kibana-ca
  - hostPath:
      path: /var/lib/docker/containers
      type: ""
    name: varlibdockercontainers
  - hostPath:
      path: /var/log/containers
      type: ""
    name: varlogcontainers
  - hostPath:
      path: /var/log/pods
      type: ""
    name: varlogpods
  - name: kube-api-access-pcgvr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
nyarly commented 2 years ago

Re-reviewing, the constraints are there on the pod and not on the container. If I'd tried to create a Pod this way directly, the Kuberrnetes API would have rejected it.

pebrc commented 2 years ago

Yes so the constraints have to be on the container level. It is documented here https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-managing-compute-resources.html#k8s-compute-resources-beats-agent

pebrc commented 2 years ago

Actually we can take this as an opportunity to see if we can improve validation of the podTemplate for Beats in a similar way as we are doing it for Elasticsearch where we dry-run the user provided podTemplate and create an event/log entry if is invalid.