Closed carlosedp closed 2 years ago
I faced with an Issue, in which I couldn't open grafana and prometheus applications (link https://grafana.192.168.0.106.nip.io)
$ curl http://prometheus.192.168.0.106.nip.io
curl: (7) Failed to connect to prometheus.192.168.0.106.nip.io port 80: Connection refused
$ curl https://prometheus.192.168.0.106.nip.io
curl: (7) Failed to connect to prometheus.192.168.0.106.nip.io port 443: Connection refused
In the browser I got same Issue "Unable to connect".
I'm using k3s and I configured my master ip address 192.168.0.106 - it's a local ip address one of my workers node
I managed to successfully deploy all pods but I don't know how do I need to connect to the applications
$ kubectl get ingress -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
alertmanager-main <none> alertmanager.192.168.0.106.nip.io 80, 443 54s
grafana <none> grafana.192.168.0.106.nip.io 80, 443 54s
prometheus-k8s <none> prometheus.192.168.0.106.nip.io 80, 443 53s
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-6b8868d698-6xlvg 2/2 Running 0 14m
arm-exporter-wmm6r 2/2 Running 0 14m
arm-exporter-67jpd 2/2 Running 0 14m
node-exporter-fbltt 2/2 Running 0 14m
alertmanager-main-0 2/2 Running 0 14m
arm-exporter-zhd5m 2/2 Running 0 14m
node-exporter-pzz6z 2/2 Running 0 14m
node-exporter-74fwt 2/2 Running 0 14m
grafana-7466bcc7c5-4hvpj 1/1 Running 0 14m
kube-state-metrics-96bf99844-g9ssn 3/3 Running 0 14m
prometheus-adapter-f78c4f4ff-kccbq 1/1 Running 0 14m
prometheus-k8s-0 3/3 Running 0 14m
Do you have any suggestions ?
You need to troubleshoot the access to your K3s cluster ingress that bridges the outside HTTP/HTTPS traffic to the pods.
Here is a reference: https://rancher.com/docs/k3s/latest/en/networking/
Have you deployed any application that uses HTTP(like NGINX, Apache) and been able to access it from your computer? It's similar to access Prometheus, Grafana and AlertManager.
Yes, I created own blog site on JS but I didn't use ingress, I configured externalIP on Service, so... I will try to troubleshoot this issue. Thanks for replay!
I solve this issue. Thank for advice, at the end I just installed nginx, configured it and after that I was able to access to prometheus and grafana. Thanks a lot!
Love this project! I am unable to access prometheus.*.nip.io . I can access both Grafana and Alert manager. My ingress shows Prometheus, and is setup correctly. The one odd thing is when I look at all my pods in Monitoring ns; I do not have Prometheus-K8s (or something along those lines that I have seen in videos). The pods I have are Prometheus Adapter and Operator. I have Re-ran make vendor and deployed them. Same thing, again no errors anywhere. And also Prometheus-K8s has a service as I just looked. Does this make any sense? TIA
Is there a way to deploy grafana and prometheus pods to the master node only ? Because sometimes they are deployed to workers
@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.
Love this project! I am unable to access prometheus.*.nip.io . I can access both Grafana and Alert manager. My ingress shows Prometheus, and is setup correctly. The one odd thing is when I look at all my pods in Monitoring ns; I do not have Prometheus-K8s (or something along those lines that I have seen in videos). The pods I have are Prometheus Adapter and Operator. I have Re-ran make vendor and deployed them. Same thing, again no errors anywhere. And also Prometheus-K8s has a service as I just looked. Does this make any sense? TIA
Doesn't make too much sense since the pods are created by the operator. Re-check your cluster and re-deploy the stack.
I redeployed and all is well, thank you
@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.
In case of grafana, I have to add the node affinity on the grafana-deployment.yaml that is inside the manifests folder, right?
Hello Carlos,
I've the same issue as YushchenkoAndrew.
I'm noob on Kubernetes (I built this cluster to learn about it)
The same issue on Alertmanager/Prometheus.
Could you please help me?
Thanks.
@exArax You need to set your master nodes as schedulable. Even this way, Kubernetes can deploy the pods to other nodes. If you need to set to a specific set of nodes, you need pod affinity.
In case of grafana, I have to add the node affinity on the grafana-deployment.yaml that is inside the manifests folder, right?
Yes, since the jsonnet code doesn't have the pod affinity for this.
Hello Carlos, I've the same issue as YushchenkoAndrew. I'm noob on Kubernetes (I built this cluster to learn about it)
The same issue on Alertmanager/Prometheus.
Could you please help me?
Thanks.
You need to make sure your Kubernetes cluster has an Ingress controller and can expose the applications. Check this first with something like an NGINX pod with a simple Hello World web page.
Hi Carlos,
Very cool project indeed. I am running Kubernetes on Ubuntu 20.04.1 (master) and a few of Raspberry Pi 4 (nodes) with raspbian on them. I installed Kubernetes with ansible playbook and it works fine.
I made all changes in vars.jsonnet as you suggested. The problem is after. make deploy
I am getting this error:
root@asus:~/cluster-monitoring# make deploy
echo "Deploying stack setup manifests..."
Deploying stack setup manifests...
kubectl apply -f ./manifests/setup/
The connection to the server localhost:8080 was refused - did you specify the right host or port?
make: *** [Makefile:37: deploy] Error 1
Do you have any suggestions?
This is the configuration:
kubectl config view
apiVersion: v1
clusters:
Thank you in advance!
Hello Carlos, I've the same issue as YushchenkoAndrew. I'm noob on Kubernetes (I built this cluster to learn about it)
The same issue on Alertmanager/Prometheus. Could you please help me? Thanks.
You need to make sure your Kubernetes cluster has an Ingress controller and can expose the applications. Check this first with something like an NGINX pod with a simple Hello World web page.
Hello Carlos, You're right! Thanks for taking your time replaying our newbies questions.
Hello, I have some problems with installation on K3s.
After the deploy operation, not all the services are installed:
Also, I am getting this error from the prometheus adapted container:
Do you have any idea on what can I I do? Thank you.
Hello again,
I want to add some authentication and authorization on prometheus.192.168.1.x.nip.io. Is there a way to do something like this prometheus.io/docs/guides/tls-encryption on the prometheus.192.168.1.x.nip.io ?
You need an ingress controller that supports authentication. look at https://github.com/carlosedp/cluster-monitoring/blob/5ead7542d166a0f9b14ca911884a458b69c31951/base_operator_stack.jsonnet#L168. It works with Traefik but might need a couple changes.
Hello, I have some problems with installation on K3s.
After the deploy operation, not all the services are installed:
Also, I am getting this error from the prometheus adapted container:
Do you have any idea on what can I I do? Thank you.
Sorry, so many variables that it's hard to know. Start deploying a test application, check your node IPs and so on.
Hi Carlos, i have followup the Cluster Monitoring deployment step by step and is running successfully, i am trying to to utilize Prometheus generator withing the node prometheus.192.168.XXX.XXX.nip.io to generate a Cisco SNMP scrape config and i am not able to access the node via ssh. How can i access the node to add scrapes/targets to the Prometheus k3s node? i am newbie in k3s and looking forward to your response. Regards
Robe
Hi Carlos, i have followup the Cluster Monitoring deployment step by step and is running successfully, i am trying to to utilize Prometheus generator withing the node prometheus.192.168.XXX.XXX.nip.io to generate a Cisco SNMP scrape config and i am not able to access the node via ssh. How can i access the node to add scrapes/targets to the Prometheus k3s node? i am newbie in k3s and looking forward to your response. Regards
Robe
To collect metrics from SNMP you need the snmp_exporter. It's out of scope of this stack but take a look at another project I have here: https://github.com/carlosedp/ddwrt-monitoring. It's not in Kubernetes but I use it for SNMP.
Thank you Carlos
Hello again,
I want to add some authentication on prometheus.192.168.1.x.nip.io. Is there a way to do something like this https://prometheus.io/docs/guides/basic-auth/ or https://www.openshift.com/blog/adding-authentication-to-your-kubernetes-web-applications-with-keycloak on the prometheus.192.168.1.x.nip.io ? I do not know which file I have to edit to add authentication to Prometheus.
As I mentioned before, the stack doesn't have anything built-in to provide authentication but you could change the ingresses to you your ingress controller (Traefik, HAProxy, etc) to add a layer of authentication.
Another option is similar to the post you linked to but that would require adding the keycloak sidecar to every pod.
Firstly, thanks for all the work you put into this @carlosedp 👏🏻. Prometheus seems to be running into an error panic: mmap: cannot allocate memory
, have you run into this before? Deleting the pod fixes the issue, and I do have memory available. Also - what is the best way to add additional targets? Thanks again
root@pi-master:/home/pi# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5+k3s1", GitCommit:"58ebdb2a2ec5318ca40649eb7bd31679cb679f71", GitTreeState:"clean", BuildDate:"2020-05-06T23:42:31Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5+k3s1", GitCommit:"58ebdb2a2ec5318ca40649eb7bd31679cb679f71", GitTreeState:"clean", BuildDate:"2020-05-06T23:42:31Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/arm"}
root@pi-master:/home/pi#
root@pi-master:/home/pi# cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
root@pi-master:/home/pi#
@carlosedp to change the ingresses I have to edit only the ingress-XXXX.yaml files or are there more files that I have to edit ?
Hey @carlosedp I was wondering if you have any interest in seeing loki ("Prometheus, but for logs") added to this tech stack? I was thinking of taking a stab at it this coming Monday
Hey @carlosedp really thanks for this stack i am using this in a few clusters that i have! One question though, how do i add a new job into prometheus? I didn't find anything describing the jobs!
Hey @carlosedp really thanks for this stack i am using this in a few clusters that i have! One question though, how do i add a new job into prometheus? I didn't find anything describing the jobs!
I came here with the same question...
prometheus-config-reloader pod has directory /etc/prometheus/config
where is prometheus.yaml.gz
file
but I have no idea hot to update it to add new job.
I can not find config map related to that file.
@carlosedp any advise? :)
Thanks!
Hey @carlosedp I was wondering if you have any interest in seeing loki ("Prometheus, but for logs") added to this tech stack? I was thinking of taking a stab at it this coming Monday
I have already installed loki via helm on my k3s cluster and it seems to work. But If you need to modify values separately for loki and prom tail I suggest to install them separately as well.
helm repo add loki https://grafana.github.io/loki/charts
helm repo update
helm install -f loki-values.yaml -n monitoring loki loki/loki
helm install -f promtail-values.yaml -n monitoring promtail loki/promtail
surprisingly loki working well on arm64 🙃
Hello @urbaned121, i'm really new into Kubernetes and Monitoring :smile: so i keep digging here and found out that we have 2 ways of describing a prometheus "scrap" (i don't know the correct term here), one is to define jobs and make him get the metrics with the value that we set, and the other one using the prometheus-operator
(which this repo use), we need to define a ServiceMonitor
which Prometheus will listen into and get the metrics!
Here we have the full documentation on how to do this: https://github.com/prometheus-operator/prometheus-operator. For me i was using the helm chart bitnami/mongodb
and in the configurations of metrics there is already a field to start a ServiceMonitor and now it is perfectly working!
Sorry @carlosedp for such noob questions, but we will keep learning! Maybe adding this info into the Readme? I don't know if this was too obvious, thanks again for the stack!
How do you update everything once everything is running? I tried adding a new module and if I do a make deploy
again I get an error that the PVC is immutable and cannot be changed. I don't wish to tear everything down just to add a new module.
Hello,
I have done basic auth like you suggested and I used Traefik. Now I want to test how to add to ingress-prometheus.yaml a ip whitelist so only the master node of K3s has access to Prometheus. I found the way that how to perform this action i use in annotations traefik.ingress.kubernetes.io/whitelist-source-range: "192.168.1.2" which is the ip of my K3s master node but I am getting Forbidden when I try to access Prometheus from the master node. Do you have any idea what I am doing wrong ?
@exArax Yes, only the ingress-* files need change.
Hey @carlosedp I was wondering if you have any interest in seeing loki ("Prometheus, but for logs") added to this tech stack? I was thinking of taking a stab at it this coming Monday
@jontg I'm looking into it. Also into Grafana Tempo but I don't know if it's related to the "monitoring" stack.
@thomazBDRI @urbaned121 to add new jobs, you define a ServiceMonitor pointing to your service. Look into the modules dir where I have definitions to different collectors even external ones like for UPS.
@polds if you only enable a module, running make
and make deploy
should work since the other manifests didn't change.
@exArax why would you want a cluster node to block/access Prometheus?
@carlosedp I have developed a rest api which performs queries to Prometheus and I want only this api to have access to Prometheus endpoints.
If the application that queries is internal to the cluster, you don't need the ingress exposing prometheus outside the cluster. Have your application call the Prometheus service directly from inside the cluster like prometheus-k8s.monitoring.svc.cluster.local
@carlosedp HI, firstly thanks for this great repo. I came across it through @geerlingguy tutorial on monitoring the Turing Pi cluster. He seemingly had no problems setting it up. I've got everything deployed AOK, but apart from temperature, there were no stats in the Grafana dashboard. I checked Prometheus, and saw State "DOWN" on all the nodes in my cluster:
I manually opened up TCP/9100 on the IPtables on one node, and the data started flowing. All well and good, but I was surprised for the need to manually open the ports, and wanted to check if I'd missed something? Especially given that I had no need to open ports for the ingress access to Grafana, Prometheus etc. (@geerlingguy didn't report doing the same thing, but he was using HypriotOS, whereas I'm using Ubuntu 20.04.1 along with a standard set of IPtables rules rolled out with Ansible.) I'm new to K8s, but I guess I anticipated the firewall rules being controlled by the cluster?
@carlosedp First of all I'd like to thank you for this great repo. I've achieved to have everything working in my k3s cluster without any major issue. I got here thanks to @geerlingguy.
Once I have it working I am analyzing the code in depth to try to understand everything it does. Diving into code I've seen that 'ksonnet' is used widely and that it has been discontinued. Did you realised that? Are you planning to replace that library? Do you think it is worth to replace it? Do you know any alternative?
@jjo93sa usually on Kubernetes clusters, we don't set IPTables rules so they don't mess with Kubernetes rules and block required ports.
@dicastro Yes, ksonnet has been discontinued for a while but the libraries has been maintained by prometheus-operator team for a while and there are talks to migrate it from the original project.
There is also Tanka from Grafana that also uses jsonnet as it's language but don't have all libraries ksonnet has.
There is many teams and projects using it currently so it's still not a problem.
@carlosedp - Thanks. So people run their K8s clusters without any firewalls on the servers? That’s an interesting paradigm shift, indeed!
60 % of the time, it works every time
On 28 Oct 2020, at 19:27, Carlos Eduardo notifications@github.com wrote:
@jjo93sa usually on Kubernetes clusters, we don't set IPTables rules so they don't mess with Kubernetes rules and block required ports.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Anyone get the CPU temperature panel working with the Raspberry Pis? I feel like I am missing a piece of the data puzzle to get this working. Like a shell script that needs to be logging to the syslog.
@wargfn I had no problems with CPU temperature panel on my RPi cluster. In fact, it was the only thing working for a while. Did you modify the vars.jsonnet file to enable the arm_exporter?
{
name: 'armExporter',
enabled: true,
file: import 'modules/arm_exporter.jsonnet',
},
@carlosedp I've run into an issue where the ingresses often change location, and I can no longer load the Grafana page when that happens:
[2020-10-31 09:17:31+2 ✘][~]
[james@tpin1][$ sudo kubectl get ingress -o wide -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana <none> grafana.10.10.50.24.nip.io 10.10.50.23 80, 443 3d14h
alertmanager-main <none> alertmanager.10.10.50.24.nip.io 10.10.50.23 80, 443 3d14h
prometheus-k8s <none> prometheus.10.10.50.24.nip.io 10.10.50.23 80, 443 3d14h
I tried to run the make target to update the ingress suffix, but wasn't quite sure if that was the right command to fix the problem?
I've run into a problem after a fresh install of cluster-monitoring, some pods do not come online
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 17m
arm-exporter-2thrp 2/2 Running 0 18m
arm-exporter-5mwtb 2/2 Running 0 18m
arm-exporter-87lqv 2/2 Running 0 18m
arm-exporter-bkfhp 2/2 Running 0 18m
arm-exporter-g4lx7 2/2 Running 0 18m
arm-exporter-l8cqn 0/2 ContainerCreating 0 18m
arm-exporter-qrdsr 2/2 Running 0 18m
arm-exporter-xwk8k 2/2 Running 0 18m
grafana-784d46dcb-6bbsr 0/1 CreateContainerError 1 18m
kube-state-metrics-6cb6df5d4-whhnl 3/3 Running 0 18m
node-exporter-4m82x 2/2 Running 0 18m
node-exporter-6gvnl 2/2 Running 0 18m
node-exporter-g9kcg 2/2 Running 0 18m
node-exporter-q9f4r 2/2 Running 0 18m
node-exporter-qb4tn 1/2 CreateContainerError 0 18m
node-exporter-r7k7m 2/2 Running 0 18m
node-exporter-th9w7 2/2 Running 1 18m
node-exporter-xmzxb 2/2 Running 0 18m
prometheus-adapter-585b57857b-9mzq8 1/1 Running 0 18m
prometheus-k8s-0 2/3 Running 1 14m
prometheus-operator-67755f959-8cm5d 2/2 Running 0 18m
$ kubectl describe pod/arm-exporter-l8cqn -n monitoring
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned monitoring/arm-exporter-l8cqn to node-6
Warning FailedCreatePodSandBox 10m (x12 over 13m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "arm-exporter-l8cqn_monitoring_8fc2fd8c-8ed4-405e-a331-c39316657e7a_0": name "arm-exporter-l8cqn_monitoring_8fc2fd8c-8ed4-405e-a331-c39316657e7a_0" is reserved for "a6a25d65277ae3272f08583eb53ad261ea48197e477d7872f5b2d4d7806de78c"
Warning FailedCreatePodSandBox 6m17s (x2 over 13m) kubelet Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedCreatePodSandBox 3m13s (x14 over 6m4s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "arm-exporter-l8cqn_monitoring_8fc2fd8c-8ed4-405e-a331-c39316657e7a_0": name "arm-exporter-l8cqn_monitoring_8fc2fd8c-8ed4-405e-a331-c39316657e7a_0" is reserved for "1354415904d5f6d8c4344f00bea58ff5d5b0956109954975c5f187ab463ed1db"
$ kubectl describe pod/grafana-784d46dcb-6bbsr -n monitoring
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned monitoring/grafana-784d46dcb-6bbsr to node-7
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-pod-total" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-k8s-resources-node" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-controller-manager" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-k8s-resources-workloads-namespace" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-node-rsrc-use" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-workload-total" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-kubernetes-cluster-dashboard" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m (x2 over 19m) kubelet MountVolume.SetUp failed for volume "grafana-dashboard-namespace-by-workload" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m (x2 over 19m) kubelet MountVolume.SetUp failed for volume "grafana-dashboard-cluster-total" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m kubelet MountVolume.SetUp failed for volume "grafana-dashboard-prometheus-remote-write" : failed to sync configmap cache: timed out waiting for the condition
Warning FailedMount 19m (x10 over 19m) kubelet (combined from similar events): MountVolume.SetUp failed for volume "grafana-dashboard-node-cluster-rsrc-use" : failed to sync configmap cache: timed out waiting for the condition
Normal Pulling 19m kubelet Pulling image "grafana/grafana:7.0.3"
Normal Pulled 6m kubelet Successfully pulled image "grafana/grafana:7.0.3" in 13m3.146093344s
Warning Failed 4m kubelet Error: context deadline exceeded
Warning Failed 4m kubelet Error: failed to reserve container name "grafana_grafana-784d46dcb-6bbsr_monitoring_b2e1fd49-0136-45b3-a3fe-a867598e523f_0": name "grafana_grafana-784d46dcb-6bbsr_monitoring_b2e1fd49-0136-45b3-a3fe-a867598e523f_0" is reserved for "7a610571de0c37b102bff0410070a5a2df12b0f774562acbef4bc5d60d9131ff"
Normal Pulled 3m45s (x2 over 4m) kubelet Container image "grafana/grafana:7.0.3" already present on machine
$ kubectl describe pod/node-exporter-qb4tn -n monitoring
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned monitoring/node-exporter-qb4tn to node-6
Warning FailedCreatePodSandBox 15m kubelet Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedCreatePodSandBox 13m (x11 over 15m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "node-exporter-qb4tn_monitoring_26f2b354-22bd-49e9-aea6-19d37eaa3a43_0": name "node-exporter-qb4tn_monitoring_26f2b354-22bd-49e9-aea6-19d37eaa3a43_0" is reserved for "11e23c3316e0e94939f141764a044cee45f977415b7c24ce6ea791d8b825cc0c"
Normal Pulled 13m kubelet Container image "prom/node-exporter:v0.18.1" already present on machine
Normal Created 12m kubelet Created container node-exporter
Normal Started 12m kubelet Started container node-exporter
Warning Failed 10m kubelet Error: context deadline exceeded
Warning Failed 9m36s (x4 over 10m) kubelet Error: failed to reserve container name "kube-rbac-proxy_node-exporter-qb4tn_monitoring_26f2b354-22bd-49e9-aea6-19d37eaa3a43_0": name "kube-rbac-proxy_node-exporter-qb4tn_monitoring_26f2b354-22bd-49e9-aea6-19d37eaa3a43_0" is reserved for "72de9646f710919d1e3a5cb0f7505837f499c6b4e76092b3ce9d94be9dcaa15e"
Normal Pulled 51s (x35 over 12m) kubelet Container image "carlosedp/kube-rbac-proxy:v0.5.0" already present on machine
I am running k3s version v1.19.3+k3s2 (f8a4547b) on a 8 RPI4 node cluster with HA (3 master, 5 worker). Has anyone encountered this issue? Thanks.
Previously I said that I managed to install everything on K3s raspberry pi cluster, but I've realised that it is not true. I am having some issues with kube-scheduler and kube-controller-manager.
Firstly I've seen that alarms about kube-scheduler and kube-controller-manager are always firing. Trying to investigate why this was happening, I've seen that kube-scheduler and kube-controller-manager metrics are not being recovered.
I've already read these issues #13, #20 and #56 .
At the begining the targets in the prometheus were empty:
After re-applying the manifests indicated in one of the previous issues (prometheus-kubeSchedulerPrometheusDiscoveryEndpoints.yaml
and prometheus-kubeControllerManagerPrometheusDiscoveryEndpoints.yaml
), the targets appeared in Prometheus, but with status DOWN
and with a Connection refused error
(these two targets are the only ones failing, the rest are working without any issue)
After some hours the situation is reverted and the targets disappear again.
What else can I do/try/check?
How can I increase the resource limits of the Grafana deployments? My grafana containers keep getting killed for consuming too much memory and I'd like to give them a little more padding.
I tried to do something similar to #84 but it did nothing to the generated manifests.
This is what I tried:
grafana+:: {
local statefulSet = k.apps.v1.statefulSet,
local container = statefulSet.mixin.spec.template.spec.containersType,
local resourceRequirements = container.mixin.resourcesTypes,
spec+:: {
resources: resourceRequirements.New() +
resourceRequirements.withRequests({ cpu: '200m', memory: '120Mi' },) +
resourceRequirements.withLimits({ cpu: '500m', memory: '280Mi' },)
},
Since I don't have too many resources and time to address all questions regarding the deployments, the Issues section is a place to report problems or improvements to the stack.
This issue is a place where you can add a comment in case of a question where me or any community member can answer in a best effort manner.
If you deployed the monitoring stack and some targets are not available or showing no metrics in Grafana, make sure you don't have IPTables rules or use a firewall on your nodes before deploying Kubernetes.
If you don't want to receive further notifications, click "Unsubscribe" in the right bar, right above the participants list.