elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
53 stars 707 forks source link

Enable ECK Fleet Server to run as non-root #7303

Closed GeorgeGkinis closed 10 months ago

GeorgeGkinis commented 1 year ago

Proposal

Enable ECK Fleet Server to run as non-root

The agents can now run as non-root when the installed integrations do not need root. For the APM Server and Fleet we do not need persistence right?

In the case of K8s logs we do need persistence and root access to the logs. We are allowed to run daemonsets as root, because daemonsets are managed by another team.

Fleet server and APM server are managed by a team that are not allowed to run as root.

Since Elasticsearch, Kibana and Agents can run non-root it would be great if the full set of ECK products can run non-root. This should include on-prem package registry as well.

According to documentation root for Fleet is only needed for CA's: "The root user is required to persist state in a hostPath volume and to trust the Elasticsearch CA in Fleet mode. See Storing local state in host path volume for options to not run the Agent container as root."

GeorgeGkinis commented 1 year ago

BTW it seems that the documentation is misleading. We cannot run an APM server nor a simple agent as non-root.

ECK version: 2.10 Agent version: 8.11.0

Logs:

{"log.level":"info","@timestamp":"2023-11-09T12:48:54.812Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":479},"message":"Starting enrollment to URL: https://fleet-server-agent-http.obs-dev-elastic-stack.svc:8220/","ecs.version":"1.6.0"}
Error: fail to enroll: remove /usr/share/elastic-agent/state/data/state.enc: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
Error: enrollment failed: exit status 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html

Process finished with exit code 0

agent.yml:

---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata: 
  name: elastic
spec:
  version: {{ .Env.ELASTIC_VERSION }}
  image: {{ .Env.ELASTIC_AGENT_IMAGE }}
  kibanaRef:
    name: kibana
  fleetServerRef: 
    name: fleet-server
  policyID: eck-agent
  mode: fleet
  deployment:
    replicas: 1
    podTemplate:
      spec:
        securityContext:
          fsGroup: 1000
        volumes:
          - name: agent-data
            emptyDir: {}
        containers:
          - name: agent
            image: {{ .Env.ELASTIC_AGENT_IMAGE }}
            resources:
              requests:
                memory: 250Mi
                cpu: 30m
              limits:
                memory: 500Mi
                cpu: 50m
---
apiVersion: v1
kind: Service
metadata:
  name: elastic-agent-http
spec:
  selector:
    agent.k8s.elastic.co/name: elastic
  ports:
  - protocol: TCP
    port: 8200
    targetPort: 8200

Pod description:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    agent.k8s.elastic.co/config-hash: "680346112"
    cni.projectcalico.org/containerID: b300e3ebb30f475e4cd74f51aae09e1016185a6c78caca3900cb3968a218cb25
    cni.projectcalico.org/podIP: REDACTED
    cni.projectcalico.org/podIPs: REDACTED
    kubernetes.io/psp: unrestricted-psp
  creationTimestamp: "2023-11-09T13:08:27Z"
  generateName: elastic-agent-7d646998d6-
  labels:
    agent.k8s.elastic.co/name: elastic
    agent.k8s.elastic.co/version: 8.11.0
    common.k8s.elastic.co/type: agent
    pod-template-hash: 7d646998d6
  name: elastic-agent-7d646998d6-7fvgx
  namespace: obs-dev-elastic-stack
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: elastic-agent-7d646998d6
    uid: f3fb18d2-8ea0-4cb1-b91c-dc4874476fde
  resourceVersion: "116011552"
  uid: 0abd8323-feed-4b25-94ed-ef0e4268558e
spec:
  automountServiceAccountToken: false
  containers:
  - env:
    - name: FLEET_CA
      value: /mnt/elastic-internal/fleetserver-association/obs-dev-elastic-stack/fleet-server/certs/ca.crt
    - name: FLEET_ENROLL
      value: "true"
    - name: FLEET_ENROLLMENT_TOKEN
      valueFrom:
        secretKeyRef:
          key: FLEET_ENROLLMENT_TOKEN
          name: elastic-agent-envvars
          optional: false
    - name: FLEET_URL
      value: https://fleet-server-agent-http.obs-dev-elastic-stack.svc:8220
    - name: CONFIG_PATH
      value: /usr/share/elastic-agent
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    image: docker.elastic.co/beats/elastic-agent-complete:8.11.0
    imagePullPolicy: IfNotPresent
    name: agent
    resources:
      limits:
        cpu: 50m
        memory: 500Mi
      requests:
        cpu: 30m
        memory: 250Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/share/elastic-agent/state
      name: agent-data
    - mountPath: /etc/agent.yml
      name: config
      readOnly: true
      subPath: agent.yml
    - mountPath: /mnt/elastic-internal/elasticsearch-association/obs-dev-elastic-stack/elasticsearch/certs
      name: elasticsearch-certs
      readOnly: true
    - mountPath: /mnt/elastic-internal/fleetserver-association/obs-dev-elastic-stack/fleet-server/certs
      name: fleetserver-certs-1
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: regcred
  nodeName: cps2-sdsnpo-a-wo4
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /var/lib/elastic-agent/obs-dev-elastic-stack/elastic/state
      type: DirectoryOrCreate
    name: agent-data
  - name: config
    secret:
      defaultMode: 288
      optional: false
      secretName: elastic-agent-config
  - name: elasticsearch-certs
    secret:
      defaultMode: 420
      optional: false
      secretName: fleet-server-agent-es-obs-dev-elastic-stack-elasticsearch-ca
  - name: fleetserver-certs-1
    secret:
      defaultMode: 420
      optional: false
      secretName: elastic-agent-fleetserver-ca
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-11-09T13:08:27Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-11-09T13:12:05Z"
    message: 'containers with unready status: [agent]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-11-09T13:12:05Z"
    message: 'containers with unready status: [agent]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-11-09T13:08:27Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://7b311c91c7d052f77ce97cfc74a3fd6f15702bd84d4b0208ef8385344a46ba5b
    image: docker.elastic.co/beats/elastic-agent-complete:8.11.0
    imageID: docker-pullable://docker.elastic.co/beats/elastic-agent-complete@sha256:fbbd71c3731a91027c23e10531beb99ef191a15f9bf0a9eb0df42d3233201453
    lastState:
      terminated:
        containerID: docker://7b311c91c7d052f77ce97cfc74a3fd6f15702bd84d4b0208ef8385344a46ba5b
        exitCode: 1
        finishedAt: "2023-11-09T13:12:04Z"
        reason: Error
        startedAt: "2023-11-09T13:11:59Z"
    name: agent
    ready: false
    restartCount: 5
    started: false
    state:
      waiting:
        message: back-off 2m40s restarting failed container=agent pod=elastic-agent-7d646998d6-7fvgx_obs-dev-elastic-stack(0abd8323-feed-4b25-94ed-ef0e4268558e)
        reason: CrashLoopBackOff
  hostIP: 10.7.255.116
  phase: Running
  podIP: 10.42.11.193
  podIPs:
  - ip: 10.42.11.193
  qosClass: Burstable
  startTime: "2023-11-09T13:08:27Z"
barkbay commented 12 months ago

I applied the following manifest and it is working as expected:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server-quickstart
  namespace: obs-dev-elastic-stack
spec:
  version: 8.11.1
  kibanaRef:
    name: kibana-quickstart
  elasticsearchRefs:
    - name: elasticsearch-quickstart
  mode: fleet
  fleetServerEnabled: true
  policyID: eck-fleet-server
  deployment:
    replicas: 1
    podTemplate:
      spec:
        securityContext:
          fsGroup: 1000
        volumes:
          - name: agent-data
            emptyDir: {}
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
> k exec pod/fleet-server-quickstart-agent-7c7f8d7754-tcnxv -- id
uid=1000(elastic-agent) gid=1000(elastic-agent) groups=1000(elastic-agent),0(root)

(full manifest here)

Pod description:

 volumes:
 - hostPath:
     path: /var/lib/elastic-agent/obs-dev-elastic-stack/elastic/state
     type: DirectoryOrCreate
   name: agent-data

This volume is not supposed to be created by the operator if it is already defined in the manifest. Could you double check that you applied to correct manifest and that there is nothing in the operator logs that would explain why the Pod is not reconciled.

barkbay commented 10 months ago

Closing due to inactivity, feel free to reopen if needed.