Adding PV results in CreateContainerConfigError and CrashLoopBackOff

Henrik-Wo commented 4 years ago

If I apply the project to my cluster with a PV it results in the following issue:

% kubectl get pv
NAME         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS   REASON   AGE
grafana      20Gi       RWO            Retain           Bound    monitoring/grafana-storage                      manual                  2m40s
prometheus   2Gi        RWO            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   manual                  2m28s

% kubectl get pods -n monitoring -o wide
NAME                                   READY   STATUS                       RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
prometheus-operator-6b8868d698-ssp5f   2/2     Running                      0          72s   10.42.2.83       kube-worker-02   <none>           <none>
arm-exporter-qlp2k                     2/2     Running                      0          60s   10.42.1.73       kube-worker-01   <none>           <none>
arm-exporter-cqszm                     2/2     Running                      0          60s   10.42.2.84       kube-worker-02   <none>           <none>
arm-exporter-sszn6                     2/2     Running                      0          60s   10.42.0.81       kube-master-1    <none>           <none>
alertmanager-main-0                    2/2     Running                      0          61s   10.42.2.85       kube-worker-02   <none>           <none>
node-exporter-tkp6t                    2/2     Running                      0          47s   192.168.78.160   kube-worker-01   <none>           <none>
node-exporter-svpg2                    2/2     Running                      0          47s   192.168.78.150   kube-master-1    <none>           <none>
prometheus-adapter-f78c4f4ff-4rtg9     1/1     Running                      0          43s   10.42.1.74       kube-worker-01   <none>           <none>
kube-state-metrics-96bf99844-k6kll     3/3     Running                      0          47s   10.42.2.87       kube-worker-02   <none>           <none>
node-exporter-hx9vh                    2/2     Running                      0          47s   192.168.78.161   kube-worker-02   <none>           <none>
prometheus-k8s-0                       2/3     CreateContainerConfigError   0          34s   10.42.1.75       kube-worker-01   <none>           <none>
grafana-7466bcc7c5-l24jf               0/1     CrashLoopBackOff             2          48s   10.42.2.86       kube-worker-02   <none>           <none>

If I chose to run without a PV (what I do not prefer) the project runs smoothly:

% kubectl get pods -n monitoring -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
prometheus-operator-6b8868d698-5gk7v   2/2     Running   0          53s   10.42.2.88       kube-worker-02   <none>           <none>
arm-exporter-sfs9b                     2/2     Running   0          41s   10.42.2.90       kube-worker-02   <none>           <none>
arm-exporter-zsdwm                     2/2     Running   0          41s   10.42.1.76       kube-worker-01   <none>           <none>
arm-exporter-tj6mj                     2/2     Running   0          41s   10.42.0.82       kube-master-1    <none>           <none>
alertmanager-main-0                    2/2     Running   0          42s   10.42.2.89       kube-worker-02   <none>           <none>
node-exporter-vd2p7                    2/2     Running   0          29s   192.168.78.160   kube-worker-01   <none>           <none>
node-exporter-bmljr                    2/2     Running   0          29s   192.168.78.150   kube-master-1    <none>           <none>
kube-state-metrics-96bf99844-jzpqg     3/3     Running   0          30s   10.42.2.91       kube-worker-02   <none>           <none>
node-exporter-td5c6                    2/2     Running   0          29s   192.168.78.161   kube-worker-02   <none>           <none>
prometheus-adapter-f78c4f4ff-xr49j     1/1     Running   0          23s   10.42.1.77       kube-worker-01   <none>           <none>
grafana-7bcf47fbcb-jhl4x               1/1     Running   0          31s   10.42.2.92       kube-worker-02   <none>           <none>
prometheus-k8s-0                       3/3     Running   0          13s   10.42.0.83       kube-master-1    <none>           <none>

Is there a way to solve this problem? Does my PV need to be configured in a certain way to work with grafana and prometheus?

carlosedp commented 4 years ago

Have you created the PVs manualy? Do you have the logs from Prometheus and Grafana?

carlosedp commented 4 years ago

Check your directory permissions. Grafana creates files with UID:GID 472:472 and Prometheus with 1000:0. For debugging, do a chmod -R 777 [dir]

Henrik-Wo commented 4 years ago

Yes, I created the PVs manualy. I'm not sure which logs you mean, but I hope they are:

% make deploy
echo "Deploying stack setup manifests..."
Deploying stack setup manifests...
kubectl apply -f ./manifests/setup/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
echo "Will wait 10 seconds to deploy the additional manifests.."
Will wait 10 seconds to deploy the additional manifests..
sleep 10
kubectl apply -f ./manifests/
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
clusterrole.rbac.authorization.k8s.io/arm-exporter created
clusterrolebinding.rbac.authorization.k8s.io/arm-exporter created
daemonset.apps/arm-exporter created
service/arm-exporter created
serviceaccount/arm-exporter created
servicemonitor.monitoring.coreos.com/arm-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-coredns-dashboard created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-kubernetes-cluster-dashboard created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-dashboard created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboard-traefik-dashboard created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
persistentvolumeclaim/grafana-storage created
ingress.extensions/alertmanager-main created
ingress.extensions/grafana created
ingress.extensions/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
endpoints/kube-controller-manager-prometheus-discovery created
service/kube-controller-manager-prometheus-discovery created
service/kube-dns-prometheus-discovery created
endpoints/kube-scheduler-prometheus-discovery created
service/kube-scheduler-prometheus-discovery created
servicemonitor.monitoring.coreos.com/prometheus-operator created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
servicemonitor.monitoring.coreos.com/traefik created

I found out that grafana has not created a directory at all and the permissions for prometheus look like this:

$ ls -ld /mnt/usbRAID/monitoring/prometheus-db
drwxr-xr-x 2 root root 4096 Jun 23 18:35 /mnt/usbRAID/monitoring/prometheus-db

Hope this helps narrow down the problem, still new to k8s.

Henrik-Wo commented 4 years ago

Did a chmod -R 777 [dir] on the prometheus directory.

ls -ld /mnt/usbRAID/monitoring/prometheus-db
drwxrwxrwx 2 root root 4096 Jun 23 18:35 /mnt/usbRAID/monitoring/prometheus-db

Afterwards I did a make deploy again. But the result is still the same.

carlosedp commented 4 years ago

Check your dir permissions, then deploy the stack. If it's already running, restart the pods

Henrik-Wo commented 4 years ago

ok, fixed the permissions and Grafana is now running. The PVs also appear to be correctly connected. But the Prometheus pod is still in CrashLoopBackOff.

The log is as follows:

% kubectl logs prometheus-k8s-0 -n monitoring -c prometheus
level=info ts=2020-07-27T16:21:49.449Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.19.1, branch=HEAD, revision=eba3fdcbf0d378b66600281903e3aab515732b39)"
level=info ts=2020-07-27T16:21:49.449Z caller=main.go:338 build_context="(go=go1.14.4, user=root@62700b3d0ef9, date=20200618-17:44:42)"
level=info ts=2020-07-27T16:21:49.449Z caller=main.go:339 host_details="(Linux 4.19.118-v7l+ #1311 SMP Mon Apr 27 14:26:42 BST 2020 armv7l prometheus-k8s-0 (none))"
level=info ts=2020-07-27T16:21:49.449Z caller=main.go:340 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-07-27T16:21:49.449Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2020-07-27T16:21:49.451Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0xbeb3c891, 0xb, 0x14, 0x23e0968, 0x4d331a0, 0x23e0968)
    /app/promql/query_logger.go:117 +0x2fc
main.main()
    /app/cmd/prometheus/main.go:368 +0x42dc

% kubectl logs -n monitoring prometheus-k8s-0 -c prometheus-config-reloader
ts=2020-07-28T10:01:43.630061122Z caller=main.go:87 msg="Starting prometheus-config-reloader version ''."
level=error ts=2020-07-28T10:01:43.656100016Z caller=runutil.go:98 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9090/-/reload\": dial tcp 127.0.0.1:9090: connect: connection refused"

% kubectl logs -n monitoring prometheus-k8s-0 -c rules-configmap-reloader  
2020/07/28 10:01:45 Watching directory: "/etc/prometheus/rules/prometheus-k8s-rulefiles-0"

carlosedp commented 4 years ago

Apparently you have permission errors on your PVs. Check the backend that provides storage as it's dependent on this.

Henrik-Wo commented 4 years ago

ok, found a workaround to fix my problem! But still think there ist something going wrong during the deployment of the stack.

I created the directories for prometheus and Grafana in the backend

$ sudo mkdir <div>/monitoring/prometheus
$ sudo mkdir <div>/monitoring/grafana

Changed the permissions for both directories

$ sudo chown -R 1000:0 <div>/monitoring/prometheus/
$ sudo chown -R 472:472 <div>/monitoring/grafana/

Make the directories available via PVs and deploy the hole stack.

Grafana is now running but Prometheus is stuck in CrashLoopBackOff.

Go back to the backend and change the permissions for Prometheus again
```
$ sudo chown -R 1000:0 <div>/monitoring/prometheus/
```

During the next restart of the Prometheus pod it catches on and starts running.

carlosedp commented 4 years ago

Must be something on the backend. I've tested with some NFS and K3s local storage and it works fine.

Gonna take another look soon.

aneeldadani commented 4 years ago

I ran into the same issue. I have updated the filesystem with sudo chmod -R 777 [dir] and this allowed the pods to go to a "Running" state. What are the recommended permissions?

carlosedp commented 4 years ago

It's recommended that the cluster has access to the mount point with full access so it can manage (read, write and execute) it's files.

carlosedp commented 4 years ago

Closing this as problem is related to backend permissions.

carlosedp / cluster-monitoring

Adding PV results in CreateContainerConfigError and CrashLoopBackOff #70