Open BenB196 opened 2 years ago
If you don't set these up with persistence, then every time you recreate them, the states are lost and start over.
ECK uses an hostPath
volume to create a location for the runtime state of Agent to be persistent across container restarts.
Are you saying it doesn't work?
By providing an example of persistence, it will not only show how to do it, but also call out the fact that this can be an "issue".
What kind of example of persistence are you thinking of? Could you elaborate more on how the current configuration may be a "problem"?
Hi @thbkrkr, sure, the general use case is that I have an Elastic Agent which stores the state of something, but has the possibility of moving to a different host.
I think today the hostPath
only really considers a Daemonset
deployment, and not really a Deployment
deployment.
Example:
If I have a single Elastic Agent pod as part of a Deployment
setup with a policy which includes an integration like Okta, it will store the last query against Okta in the Agent's state. If I, then drain/shutdown the underlying host (either for updating or removal from the cluster), the Agent will move to a new node in the Kubernetes cluster, at which point the Agent will start from scratch as the previous state file is lost.
This can be solved by adding a persistent volume that supports changing hosts (AWS EBS/EFS, NFS, etc...), rather than using hostPath
.
This request is more to have this use case documented (and pointed out) and provide an example for creating a PV/PVC from within the Elastic Agent spec.
Indeed, hostpath
does not work when an Agent Pod is re-scheduled on another k8s node. Makes sense, thanks for the explanation.
Updated: 2022-08-08 To provide working example.
After messing around with this, I think this example would work with some cleanup/generalization.
What I did:
Create PV/PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
finalizers:
- kubernetes.io/pvc-protection
name: test-persistence
namespace: elastic-prod
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: longhorn
volumeMode: Filesystem
volumeName: pvc-1d0b8c21-e279-411f-ba73-41cfa203e87c
Create Elastic Agent Deployment:
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: test-persistence
namespace: elastic-prod
spec:
deployment:
podTemplate:
metadata:
creationTimestamp: null
spec:
automountServiceAccountToken: true
containers:
- env:
- name: FLEET_ENROLLMENT_TOKEN
value: <snipped>
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
name: agent
resources:
limits:
cpu: 400m
memory: 2Gi
requests:
cpu: 400m
memory: 2Gi
volumeMounts:
- mountPath: /usr/share/elastic-agent/state
name: agent-data
securityContext:
runAsUser: 0
serviceAccountName: elastic-agent
volumes:
- name: agent-data
persistentVolumeClaim:
claimName: test-persistence
replicas: 1
strategy:
type: Recreate
fleetServerRef:
name: fleet-server
http:
service:
metadata: {}
spec: {}
tls:
certificate: {}
kibanaRef:
name: kibana-prod
namespace: kibana-prod
mode: fleet
version: 8.3.2
Look at the deployed deployment and see that both the volume and volumeMount are there:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-persistence-agent
namespace: elastic-prod
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
agent.k8s.elastic.co/name: test-persistence
common.k8s.elastic.co/type: agent
strategy:
type: Recreate
template:
metadata:
annotations:
agent.k8s.elastic.co/config-hash: "2850047686"
creationTimestamp: null
labels:
agent.k8s.elastic.co/name: test-persistence
agent.k8s.elastic.co/version: 8.3.2
common.k8s.elastic.co/type: agent
spec:
automountServiceAccountToken: true
containers:
- command:
- /usr/bin/env
- bash
- -c
- |
#!/usr/bin/env bash
set -e
if [[ -f /mnt/elastic-internal/elasticsearch-association/elastic-prod/es-prod/certs/ca.crt ]]; then
if [[ -f /usr/bin/update-ca-trust ]]; then
cp /mnt/elastic-internal/elasticsearch-association/elastic-prod/es-prod/certs/ca.crt /etc/pki/ca-trust/source/anchors/
/usr/bin/update-ca-trust
elif [[ -f /usr/sbin/update-ca-certificates ]]; then
cp /mnt/elastic-internal/elasticsearch-association/elastic-prod/es-prod/certs/ca.crt /usr/local/share/ca-certificates/
/usr/sbin/update-ca-certificates
fi
fi
/usr/bin/tini -- /usr/local/bin/docker-entrypoint -e
env:
- name: FLEET_ENROLLMENT_TOKEN
value: <snipped>
- name: NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: FLEET_CA
value: /mnt/elastic-internal/fleetserver-association/elastic-prod/fleet-server/certs/ca.crt
- name: FLEET_ENROLL
value: "true"
- name: FLEET_URL
value: https://fleet-server-agent-http.elastic-prod.svc:8220
- name: KIBANA_FLEET_CA
value: /mnt/elastic-internal/kibana-association/kibana-prod/kibana-prod/certs/ca.crt
- name: KIBANA_FLEET_HOST
value: https://kibana-prod-kb-http.kibana-prod.svc:5601
- name: KIBANA_FLEET_PASSWORD
valueFrom:
secretKeyRef:
key: KIBANA_FLEET_PASSWORD
name: test-persistence-agent-envvars
optional: false
- name: KIBANA_FLEET_SETUP
value: "true"
- name: KIBANA_FLEET_USERNAME
valueFrom:
secretKeyRef:
key: KIBANA_FLEET_USERNAME
name: test-persistence-agent-envvars
optional: false
- name: CONFIG_PATH
value: /usr/share/elastic-agent
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: docker.elastic.co/beats/elastic-agent:8.3.2
imagePullPolicy: IfNotPresent
name: agent
resources:
limits:
cpu: 400m
memory: 2Gi
requests:
cpu: 400m
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/elastic-agent/state
name: agent-data
- mountPath: /etc/agent.yml
name: config
readOnly: true
subPath: agent.yml
- mountPath: /mnt/elastic-internal/elasticsearch-association/elastic-prod/es-prod/certs
name: elasticsearch-certs
readOnly: true
- mountPath: /mnt/elastic-internal/fleetserver-association/elastic-prod/fleet-server/certs
name: fleetserver-certs-1
readOnly: true
- mountPath: /mnt/elastic-internal/kibana-association/kibana-prod/kibana-prod/certs
name: kibana-certs-0
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsUser: 0
serviceAccount: elastic-agent
serviceAccountName: elastic-agent
terminationGracePeriodSeconds: 30
volumes:
- name: agent-data
persistentVolumeClaim:
claimName: test-persistence
- name: config
secret:
defaultMode: 288
optional: false
secretName: test-persistence-agent-config
- name: elasticsearch-certs
secret:
defaultMode: 420
optional: false
secretName: fleet-server-agent-es-elastic-prod-es-prod-ca
- name: fleetserver-certs-1
secret:
defaultMode: 420
optional: false
secretName: test-persistence-agent-fleetserver-ca
- name: kibana-certs-0
secret:
defaultMode: 420
optional: false
secretName: test-persistence-agent-kibana-ca
(adding to use-case)
I'm deploying an Agent in fleet server mode. Only want one instance of it, so Deployment. I want to avoid loosing state when it gets rescheduled on another node. Additionally, K8s nodes run with relatively small disks, and I'd prefer to avoid anything (with unknown sizing) using the nodes own disk - to avoid unexpected disk-full states.
It would be great if ECK natively supported a persistent storage deployment mode for Elastic Agent when you are pulling from external data sources (SaaS provider APIs, etc.) The above workaround with creating your own PVC/PV works, but it is a bit annoying to have to manage differently from all of the other resources in ECK.
Proposal
Provide an example of setting up persistent storage for Elastic Agent/Fleet/Beats in the docs.
Use case. Why is this important?
Elastic Agents/Beats have a concept of a registry, which keeps track of states within the different modules. If you don't set these up with persistence, then every time you recreate them, the states are lost and start over. By providing an example of persistence, it will not only show how to do it, but also call out the fact that this can be an "issue".