Closed GeorgeGkinis closed 10 months ago
BTW it seems that the documentation is misleading. We cannot run an APM server nor a simple agent as non-root.
ECK version: 2.10 Agent version: 8.11.0
Logs:
{"log.level":"info","@timestamp":"2023-11-09T12:48:54.812Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":479},"message":"Starting enrollment to URL: https://fleet-server-agent-http.obs-dev-elastic-stack.svc:8220/","ecs.version":"1.6.0"}
Error: fail to enroll: remove /usr/share/elastic-agent/state/data/state.enc: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
Error: enrollment failed: exit status 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
Process finished with exit code 0
agent.yml:
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: elastic
spec:
version: {{ .Env.ELASTIC_VERSION }}
image: {{ .Env.ELASTIC_AGENT_IMAGE }}
kibanaRef:
name: kibana
fleetServerRef:
name: fleet-server
policyID: eck-agent
mode: fleet
deployment:
replicas: 1
podTemplate:
spec:
securityContext:
fsGroup: 1000
volumes:
- name: agent-data
emptyDir: {}
containers:
- name: agent
image: {{ .Env.ELASTIC_AGENT_IMAGE }}
resources:
requests:
memory: 250Mi
cpu: 30m
limits:
memory: 500Mi
cpu: 50m
---
apiVersion: v1
kind: Service
metadata:
name: elastic-agent-http
spec:
selector:
agent.k8s.elastic.co/name: elastic
ports:
- protocol: TCP
port: 8200
targetPort: 8200
Pod description:
apiVersion: v1
kind: Pod
metadata:
annotations:
agent.k8s.elastic.co/config-hash: "680346112"
cni.projectcalico.org/containerID: b300e3ebb30f475e4cd74f51aae09e1016185a6c78caca3900cb3968a218cb25
cni.projectcalico.org/podIP: REDACTED
cni.projectcalico.org/podIPs: REDACTED
kubernetes.io/psp: unrestricted-psp
creationTimestamp: "2023-11-09T13:08:27Z"
generateName: elastic-agent-7d646998d6-
labels:
agent.k8s.elastic.co/name: elastic
agent.k8s.elastic.co/version: 8.11.0
common.k8s.elastic.co/type: agent
pod-template-hash: 7d646998d6
name: elastic-agent-7d646998d6-7fvgx
namespace: obs-dev-elastic-stack
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: elastic-agent-7d646998d6
uid: f3fb18d2-8ea0-4cb1-b91c-dc4874476fde
resourceVersion: "116011552"
uid: 0abd8323-feed-4b25-94ed-ef0e4268558e
spec:
automountServiceAccountToken: false
containers:
- env:
- name: FLEET_CA
value: /mnt/elastic-internal/fleetserver-association/obs-dev-elastic-stack/fleet-server/certs/ca.crt
- name: FLEET_ENROLL
value: "true"
- name: FLEET_ENROLLMENT_TOKEN
valueFrom:
secretKeyRef:
key: FLEET_ENROLLMENT_TOKEN
name: elastic-agent-envvars
optional: false
- name: FLEET_URL
value: https://fleet-server-agent-http.obs-dev-elastic-stack.svc:8220
- name: CONFIG_PATH
value: /usr/share/elastic-agent
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: docker.elastic.co/beats/elastic-agent-complete:8.11.0
imagePullPolicy: IfNotPresent
name: agent
resources:
limits:
cpu: 50m
memory: 500Mi
requests:
cpu: 30m
memory: 250Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elastic-agent/state
name: agent-data
- mountPath: /etc/agent.yml
name: config
readOnly: true
subPath: agent.yml
- mountPath: /mnt/elastic-internal/elasticsearch-association/obs-dev-elastic-stack/elasticsearch/certs
name: elasticsearch-certs
readOnly: true
- mountPath: /mnt/elastic-internal/fleetserver-association/obs-dev-elastic-stack/fleet-server/certs
name: fleetserver-certs-1
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: regcred
nodeName: cps2-sdsnpo-a-wo4
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /var/lib/elastic-agent/obs-dev-elastic-stack/elastic/state
type: DirectoryOrCreate
name: agent-data
- name: config
secret:
defaultMode: 288
optional: false
secretName: elastic-agent-config
- name: elasticsearch-certs
secret:
defaultMode: 420
optional: false
secretName: fleet-server-agent-es-obs-dev-elastic-stack-elasticsearch-ca
- name: fleetserver-certs-1
secret:
defaultMode: 420
optional: false
secretName: elastic-agent-fleetserver-ca
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-11-09T13:08:27Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-11-09T13:12:05Z"
message: 'containers with unready status: [agent]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-11-09T13:12:05Z"
message: 'containers with unready status: [agent]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-11-09T13:08:27Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://7b311c91c7d052f77ce97cfc74a3fd6f15702bd84d4b0208ef8385344a46ba5b
image: docker.elastic.co/beats/elastic-agent-complete:8.11.0
imageID: docker-pullable://docker.elastic.co/beats/elastic-agent-complete@sha256:fbbd71c3731a91027c23e10531beb99ef191a15f9bf0a9eb0df42d3233201453
lastState:
terminated:
containerID: docker://7b311c91c7d052f77ce97cfc74a3fd6f15702bd84d4b0208ef8385344a46ba5b
exitCode: 1
finishedAt: "2023-11-09T13:12:04Z"
reason: Error
startedAt: "2023-11-09T13:11:59Z"
name: agent
ready: false
restartCount: 5
started: false
state:
waiting:
message: back-off 2m40s restarting failed container=agent pod=elastic-agent-7d646998d6-7fvgx_obs-dev-elastic-stack(0abd8323-feed-4b25-94ed-ef0e4268558e)
reason: CrashLoopBackOff
hostIP: 10.7.255.116
phase: Running
podIP: 10.42.11.193
podIPs:
- ip: 10.42.11.193
qosClass: Burstable
startTime: "2023-11-09T13:08:27Z"
I applied the following manifest and it is working as expected:
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: fleet-server-quickstart
namespace: obs-dev-elastic-stack
spec:
version: 8.11.1
kibanaRef:
name: kibana-quickstart
elasticsearchRefs:
- name: elasticsearch-quickstart
mode: fleet
fleetServerEnabled: true
policyID: eck-fleet-server
deployment:
replicas: 1
podTemplate:
spec:
securityContext:
fsGroup: 1000
volumes:
- name: agent-data
emptyDir: {}
serviceAccountName: elastic-agent
automountServiceAccountToken: true
> k exec pod/fleet-server-quickstart-agent-7c7f8d7754-tcnxv -- id
uid=1000(elastic-agent) gid=1000(elastic-agent) groups=1000(elastic-agent),0(root)
(full manifest here)
Pod description:
volumes: - hostPath: path: /var/lib/elastic-agent/obs-dev-elastic-stack/elastic/state type: DirectoryOrCreate name: agent-data
This volume is not supposed to be created by the operator if it is already defined in the manifest. Could you double check that you applied to correct manifest and that there is nothing in the operator logs that would explain why the Pod is not reconciled.
Closing due to inactivity, feel free to reopen if needed.
Proposal
Enable ECK Fleet Server to run as non-root
The agents can now run as non-root when the installed integrations do not need root. For the APM Server and Fleet we do not need persistence right?
In the case of K8s logs we do need persistence and root access to the logs. We are allowed to run daemonsets as root, because daemonsets are managed by another team.
Fleet server and APM server are managed by a team that are not allowed to run as root.
Since Elasticsearch, Kibana and Agents can run non-root it would be great if the full set of ECK products can run non-root. This should include on-prem package registry as well.
According to documentation root for Fleet is only needed for CA's: "The root user is required to persist state in a hostPath volume and to trust the Elasticsearch CA in Fleet mode. See Storing local state in host path volume for options to not run the Agent container as root."