Open antifragileer opened 8 years ago
I've tracked this down and the error from above
Error response from daemon: mkdir /mnt/ephemeral: read-only file system
Is caused by a volume mount on the fluentd daemonset.
In fact there were two volume mounts that I needed to remove, the name of each suggest they're not needed on GCE anyhow. The workaround until we get a proper fix is to edit the daemonset:
kubectl edit ds fluentd
delete lines:
https://github.com/fabric8io/fabric8-devops/blob/master/fluentd/src/main/fabric8/daemonset.yml#L28-L33 and https://github.com/fabric8io/fabric8-devops/blob/master/fluentd/src/main/fabric8/daemonset.yml#L47-L52
@jimmidyson do you know of a proper fix?
So I finally got to this. I installed management to an environment and ran the daemon set edit. But neither of those two lines are in the daemon set for fluentd installed into that namespace.
kubectl -n dev-testing edit ds fluentd
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
annotations:
fabric8.io/iconUrl: https://cdn.rawgit.com/fabric8io/fabric8-devops/master/fluentd/src/main/fabric8/icon.png
creationTimestamp: 2016-11-21T20:26:42Z
generation: 2
labels:
group: io.fabric8.devops.apps
project: fluentd
provider: fabric8
version: 2.2.297
name: fluentd
namespace: dev-testing
resourceVersion: "851511"
selfLink: /apis/extensions/v1beta1/namespaces/dev-testing/daemonsets/fluentd
uid: d041686b-b028-11e6-a600-42010af0012a
spec:
selector:
matchLabels:
group: io.fabric8.devops.apps
project: fluentd
provider: fabric8
version: 2.2.297
template:
metadata:
annotations:
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
group: io.fabric8.devops.apps
project: fluentd
provider: fabric8
version: 2.2.297
spec:
containers:
- env:
- name: ELASTICSEARCH_HOST
value: elasticsearch
- name: ELASTICSEARCH_PORT
value: "9200"
image: fabric8/fluentd-kubernetes:v1.19
imagePullPolicy: IfNotPresent
name: fluentd
ports:
- containerPort: 24231
name: scrape
protocol: TCP
resources:
limits:
cpu: 100m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /var/log
name: varlog
- mountPath: /var/lib/docker/containers
name: defaultdocker
readOnly: true
- mountPath: /mnt/ephemeral/docker/containers
name: awsdocker
readOnly: true
- mountPath: /mnt/sda1/var/lib/docker/containers
name: minikubedocker
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
serviceAccount: fluentd
serviceAccountName: fluentd
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /var/log
name: varlog
- hostPath:
path: /var/lib/docker/containers
name: defaultdocker
- hostPath:
path: /mnt/ephemeral/docker/containers
name: awsdocker
- hostPath:
path: /mnt/sda1/var/lib/docker/containers
name: minikubedocker
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberMisscheduled: 0
And describing it...
kubectl -n dev-testing describe ds fluentd
Name: fluentd
Image(s): fabric8/fluentd-kubernetes:v1.19
Selector: group=io.fabric8.devops.apps,project=fluentd,provider=fabric8,version=2.2.297
Node-Selector: <none>
Labels: group=io.fabric8.devops.apps
project=fluentd
provider=fabric8
version=2.2.297
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Misscheduled: 0
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
6m 6m 1 {daemon-set } Normal SuccessfulCreate Created pod: fluentd-wknvj
6m 6m 1 {daemon-set } Normal SuccessfulCreate Created pod: fluentd-eskok
And the pods...
kubectl -n dev-testing get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-2415127616-hv5m8 2/2 Running 0 11m
fluentd-eskok 0/1 RunContainerError 7 11m
fluentd-wknvj 0/1 CrashLoopBackOff 7 11m
grafana-3902895550-4zgi7 1/1 Running 0 3h
kibana-3264104781-5raun 2/2 Running 0 2h
message-broker-1045034239-k0wim 1/1 Running 0 3h
message-gateway-474760680-yct21 1/1 Running 0 3h
ms-dev-404638559-1bf51 1/1 Running 0 3h
node-exporter-0gyqr 1/1 Running 0 3h
node-exporter-r52t7 1/1 Running 0 3h
prometheus-999244325-tegq9 2/2 Running 0 3h
prometheus-blackbox-expo-1820759746-4xf1t 1/1 Running 0 3h
zookeeper-3695684073-h7c1v 1/1 Running 0 3h
So are these the offending lines?
- mountPath: /mnt/ephemeral/docker/containers
name: awsdocker
readOnly: true
- mountPath: /mnt/sda1/var/lib/docker/containers
name: minikubedocker
readOnly: true
And..
volumes:
- hostPath:
path: /var/log
name: varlog
- hostPath:
path: /var/lib/docker/containers
name: defaultdocker
- hostPath:
path: /mnt/ephemeral/docker/containers
name: awsdocker
- hostPath:
path: /mnt/sda1/var/lib/docker/containers
name: minikubedocker
The google kube-system
namespace fluentd pods have this specified for the volumeMounts:
volumeMounts:
- mountPath: /var/log
name: varlog
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
- mountPath: /var/log/journal
name: journaldir
- mountPath: /host/lib
name: libsystemddir
And also...
volumes:
- hostPath:
path: /var/log
name: varlog
- hostPath:
path: /var/lib/docker/containers
name: varlibdockercontainers
- hostPath:
path: /var/log/journal
name: journaldir
- hostPath:
path: /usr/lib64
name: libsystemddir
Ok, so I fixed the issue. I edited the daemon set as recommended. But I needed to change the volumes and volume mounts as follows in the daemon set:
volumeMounts:
- mountPath: /var/log
name: varlog
- mountPath: /var/lib/docker/containers
name: defaultdocker
readOnly: true
- mountPath: /var/log/journal
name: journaldir
- mountPath: /host/lib
name: libsystemddir
And...
volumes:
- hostPath:
path: /var/log
name: varlog
- hostPath:
path: /var/lib/docker/containers
name: defaultdocker
- hostPath:
path: /var/log/journal
name: journaldir
- hostPath:
path: /usr/lib64
name: libsystemddir
After changing that in the daemon set, the pods come up after I delete them and they auto create again.
I deployed the latest fabric8 release. Things are much more stable.
That said, fluentd is still crashing after installing the management components on GKE. Also, for some reason there are two fluentd pods instead of the single one I saw before.
Here is the dump from the fluentd pod(s)
And from the second pod