carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator
MIT License
740 stars 200 forks source link

Permission error on persistent volumes with Prometheus and Grafana #63

Closed exArax closed 4 years ago

exArax commented 4 years ago

I configured the master-ip in vars I have also enabled k3s, persistance volume and suffix in vars.jsonnet. My k3s master has the ip 192.168.1.2 and the worker 192.168.1.4. Then I did the make vendor, make and make deploy all pods are running but for some reason I cannot access grafana, prometheus and alertmanager. So I checked did kubectl get ingress --all-namespaces and the result was the following. Is there anything wrong with the steps I have perfomed ?

NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE monitoring alertmanager-main alertmanager.192.168.1.2.nip.io 192.168.1.4 80, 443
monitoring grafana grafana.192.168.1.2.nip.io 192.168.1.4 80, 443 12s monitoring prometheus-k8s prometheus.192.168.1.2.nip.io 192.168.1.4 80, 443

exArax commented 4 years ago

Sorry I was wrong prometheus-k8s and grafana in sudo kubectl get pods -n monitoring -o wide have pending status

carlosedp commented 4 years ago

Maybe your PVs were not created. Do you have a storageClass to provide them since you enabled persistence?

exArax commented 4 years ago

I have created 2 pvs one for grafana with the name grafana-pv-volume with storageClass grafanapv and one for prometheus prometheus-pv-volume with storageClass prometheuspv. If I disable the persistance the status change to running but yet again I cannot access prometheus UI and grafana UI. enablePersistence: { prometheus: true, grafana: true, prometheusPV: 'prometheus-pv-volume', prometheusSizePV: '2Gi', grafanaPV: 'grafana-pv-volume', grafanaSizePV: '20Gi', },

Is it normal in ingress to show on host the ip of the master and in the address the ip of the worker ?

exArax commented 4 years ago

I don't know if I have to change something in base_operator_stack.jsonnet to perform the persistant volume claim. For example in the section of prometheus :

(if vars.enablePersistence.prometheusPV != '' then pvc.mixin.spec.withVolumeName(vars.enablePersistence.prometheusPV)) // Uncomment below to define a StorageClass name

Also I see that in the section of grafana the pvc part is a litttle bit different and there is not a line like the last one in the above example. I have found this ingress problem in the last commit. I apologize for writing too many comments, but I was trying some changes at the same time.

exArax commented 4 years ago

After all you were right there is something in persistence volume that is causing the problem, I disabled the persistance and prometheus and grafana UI working fine now. I will try to find what I am doing wrong and I will inform you. I think that the problem is in the claim because in sudo kubectl get pv returns both pvs that I have created.

exArax commented 4 years ago

I found my mistake it was in the persistance volume, I was referring to a storageClassName that it didn't exitst. So I searched if there is one already in the k3s cluster and I found the local-path storageClassName which I also used in the pv yaml files. It finally works !!! Sorry for the trouble

apiVersion: v1 kind: PersistentVolume metadata: name: prometheus-pv-volume labels: type: local spec: storageClassName: local-path capacity: storage: 10Gi accessModes:

Gory19 commented 4 years ago

I have a problem with the persistence of grafana, it cannot create files and folders. When I go to see the logs in the pod I get the error: GF_PATHS_DATA`` = '/ var / lib / grafana' is not writable. You may have issues with file permissions

carlosedp commented 4 years ago

Can you retest with latest master? Might be related to https://github.com/carlosedp/cluster-monitoring/commit/71e9e55f4ce0445227cb85ce03f058c1a23f52e0

Gory19 commented 4 years ago

Can you retest with latest master? Might be related to 71e9e55

Nothing. I had to change the permissions with chmod 777.

carlosedp commented 4 years ago

I've enabled Grafana persistence and pointed it to the local-path StorageClass of a K3s cluster. It created the PV and used it correctly to persist dashboards.

Gory19 commented 4 years ago

I've enabled Grafana persistence and pointed it to the local-path StorageClass of a K3s cluster. It created the PV and used it correctly to persist dashboards.

I did the same thing but nothing. Also in the "grafana" folder in nfs share I only have 1 file and 2 folders: grafana.db, the plugins folder (empty) and the png folder (empty).

carlosedp commented 4 years ago

Can you remove the stack, deleting all resources including the namespace and do a fresh deploy? I just deployed it as this and works fine:

❯ k get pv --all-namespaces
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS   REASON   AGE
pvc-34b0239d-38a6-43c9-a8c1-634c773343bf   20Gi       RWO            Delete           Bound    monitoring/grafana-storage                      local-path              41m
pvc-1594480b-59af-4de6-8a65-267cbaddc8b9   2Gi        RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   local-path              40m
❯ k get pvc --all-namespaces
NAMESPACE    NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
monitoring   grafana-storage                      Bound    pvc-34b0239d-38a6-43c9-a8c1-634c773343bf   20Gi       RWO            local-path     41m
monitoring   prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-1594480b-59af-4de6-8a65-267cbaddc8b9   2Gi        RWO            local-path     41m
❯ kpod
NAMESPACE            NAME                                      READY   STATUS      RESTARTS   AGE    IP             NODE       NOMINATED NODE   READINESS GATES
kube-system          coredns-d798c9dd-czfx9                    1/1     Running     2          4d4h   10.42.0.13     odroidn2   <none>           <none>
kube-system          svclb-traefik-l8lmj                       2/2     Running     4          8d     10.42.0.5      odroidn2   <none>           <none>
kube-system          helm-install-traefik-mvtnv                0/1     Completed   2          8d     10.42.0.3      odroidn2   <none>           <none>
kube-system          local-path-provisioner-58fb86bdfd-sg4lx   1/1     Running     2          8d     10.42.0.8      odroidn2   <none>           <none>
kube-system          metrics-server-6d684c7b5-hz6qf            1/1     Running     2          8d     10.42.0.2      odroidn2   <none>           <none>
kube-system          traefik-6787cddb4b-8vz5q                  1/1     Running     2          4d4h   10.42.0.12     odroidn2   <none>           <none>
monitoring           prometheus-operator-67586fc88-qmnf4       2/2     Running     0          41m    10.42.0.16     odroidn2   <none>           <none>
monitoring           prometheus-k8s-0                          3/3     Running     0          41m    10.42.0.24     odroidn2   <none>           <none>
monitoring           arm-exporter-t2r8x                        2/2     Running     0          41m    10.42.0.17     odroidn2   <none>           <none>
monitoring           kube-state-metrics-857f95d994-sts44       3/3     Running     0          41m    10.42.0.20     odroidn2   <none>           <none>
monitoring           alertmanager-main-0                       2/2     Running     0          41m    10.42.0.18     odroidn2   <none>           <none>
monitoring           node-exporter-mx8cf                       2/2     Running     0          41m    192.168.1.15   odroidn2   <none>           <none>
monitoring           prometheus-adapter-9c79c98f7-hjcpg        1/1     Running     0          41m    10.42.0.21     odroidn2   <none>           <none>
monitoring           grafana-64d44f45dd-kzdjv                  1/1     Running     0          41m    10.42.0.22     odroidn2   <none>           <none>
prometheus-example   prometheus-example-app-76946bbf86-nbpb6   1/1     Running     2          5d2h   10.42.0.3      odroidn2   <none>           <none>
carlosedp commented 4 years ago

Check your directory permissions. Grafana creates files with UID:GID 472:472 and Prometheus with 1000:0.

Gory19 commented 4 years ago

I've already done it this afternoon. These are my PV, am I doing something wrong?

kubectl get pv
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS   REASON   AGE
grafana           10Gi       RWO            Retain           Bound    monitoring/grafana-storage                      local-path              6h32m
prometheus        10Gi       RWO            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   local-path              6h32m
grafana.persistentvolume.yml
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana
  labels:
    type: local
spec:
  storageClassName: local-path
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/nas/grafana"
---
prometheus.persistentvolume.yml
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus
  labels:
    type: local
spec:
  storageClassName: local-path
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/nas/prometheus"
---
carlosedp commented 4 years ago

If you are using the local-path provisioner from K3s, don't pre-create the PVs. Just enable the persistence, and define: storageClass: 'local-path', in the vars.jsonnet. Keep the PV names blank. The stack PVCs will ask for a PV for the StorageClass.

Check 301785d.

Gory19 commented 4 years ago

I'm doing a clean installation. But what do your "grafana" and "prometheus" folders contain?

carlosedp commented 4 years ago

Something like:

❯ tree
.
├── grafana
│   ├── grafana.db
│   ├── plugins
│   └── png
└── prometheus
    └── prometheus-db
        ├── chunks_head
        │   └── 000001
        ├── queries.active
        └── wal
            └── 00000000
Gory19 commented 4 years ago

Ah ok then all right. Now it works, thank you friend :D. One last question, how can I assign the prometheus pods to a node? It consumes a lot of memory and I don't know why it always assigns it to the node with less memory.

carlosedp commented 4 years ago

You need to set node affinity. Check https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/