port-forward for grafana hangs indefinitely: Grafana webpage not accessible

alehanderoo commented 5 months ago

Hi @geerlingguy,

First of all, thank you for open-sourcing this! I’ve learned a lot about Ansible and server configuration over the last few days (and nights)! What a fantastic tool!

Describe the bug

When I log into my control_plane node, set sudo su, copy the config with : cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
Then nano ~/.kube/config I changed the 127.0.0.1 to 192.168.2.52 (my wlan0 of the control_plane on which drupal is accessible from my workstation)
```
apiVersion: v1
clusters:
```
cluster: certificate-authority-data: xxx server: https://92.168.2.52:6443 name: default contexts:
context: cluster: default user: default name: default current-context: default kind: Config preferences: {} users:
name: default user: client-certificate-data: xxx client-key-data: xxx

When I then run kubectl port-forward service/cluster-monitoring-grafana :80 (as user and as root) the device does not finish the command and grafana is never accessible.

root@node1:/home/rock# kubectl port-forward service/cluster-monitoring-grafana :80
Forwarding from 127.0.0.1:46238 -> 3000
Forwarding from [::1]:46238 -> 3000

Opening http://192.168.2.52:46238/ does not return a page.

Troubleshooting

I'm running a self-built cluster. Control_plane on a rockpi4:

   Static hostname: rockpi4c
       Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 4.4.154-112-rockchip-gfdb18c8bab17
      Architecture: arm64

Remaining 4 nodes: (Rpi4 and Rpi3)

Operating System: Debian GNU/Linux 12 (bookworm)
          Kernel: Linux 6.6.31+rpt-rpi-v8
    Architecture: arm64

Networking:

I needed to update the networking.yml so my nodes got internet through wlan0 of the rockpi -> this works
my configure_routing.yml file for reference (I run this playbook prior to running main.yml):
```
---
```
name: Set up static networking configuration. hosts: cluster gather_facts: false become: true vars_files:
- config.yml tasks:
- name: Configure hosts file so nodes can see each other by hostname. ansible.builtin.blockinfile: path: /etc/hosts marker: "# ANSIBLE MANAGED - static ip config {mark}" block: | {% for host in groups['cluster'] %} {{ ipv4_subnet_prefix }}.{{ hostvars[host].ip_host_octet }} {{ host }} {{ host | regex_replace('.local', '') }} {% endfor %} insertafter: EOF
name: Configure Control Plane (Node1) hosts: control_plane become: true

handlers:
- name: restart dnsmasq ansible.builtin.service: name: dnsmasq state: restarted
- name: persist iptables rules ansible.builtin.command: netfilter-persistent save
tasks:
- name: Install routing prerequisites. ansible.builtin.apt: name:
  - dnsmasq
  - netfilter-persistent
  - iptables-persistent state: present
- name: Ensure netfilter-persistent is enabled. ansible.builtin.service: name: netfilter-persistent enabled: true
- name: Ensure dnsmasq is running and enabled. ansible.builtin.service: name: dnsmasq state: started enabled: true
- name: Enable IPv4 forwarding. ansible.posix.sysctl: name: net.ipv4.ip_forward value: '1' sysctl_set: yes
- name: Remove default route via eth0 command: ip route del default via 192.168.3.254 dev eth0 ignore_errors: yes
- name: Add default route via wlan0 with correct metric command: ip route add default via 192.168.2.254 dev wlan0 metric 100 ignore_errors: yes
- name: Flush existing NAT rules command: iptables -t nat -F
- name: Flush existing NAT rules command: sudo iptables -F FORWARD
- name: Set up NAT for wlan0 ansible.builtin.iptables: table: nat chain: POSTROUTING jump: MASQUERADE out_interface: wlan0 source: 192.168.3.0/24 notify: persist iptables rules
- name: Ensure FORWARD chain allows traffic between interfaces ansible.builtin.iptables: table: filter chain: FORWARD jump: ACCEPT in_interface: eth0 out_interface: wlan0 source: 192.168.3.0/24 ctstate: NEW,ESTABLISHED,RELATED notify: persist iptables rules
- name: Ensure FORWARD chain allows returning traffic ansible.builtin.iptables: table: filter chain: FORWARD jump: ACCEPT in_interface: wlan0 out_interface: eth0 ctstate: ESTABLISHED,RELATED notify: persist iptables rules
- name: Configure dnsmasq for bridged DNS. ansible.builtin.copy: dest: /etc/dnsmasq.d/bridge.conf content: | interface=eth0 bind-interfaces server=1.1.1.1 server=1.0.0.1 domain-needed bogus-priv notify: restart dnsmasq
See: https://github.com/geerlingguy/turing-pi-2-cluster/issues/9
- name: Add crontab task to restart dnsmasq. ansible.builtin.cron: name: "restart dnsmasq if not running" minute: "*" job: "/usr/bin/systemctl status dnsmasq || /usr/bin/systemctl restart dnsmasq"
name: Configure Nodes hosts: nodes become: true tasks:
- name: Remove the incorrect default gateway command: ip route del default via 192.168.3.254 dev eth0 ignore_errors: yes
- name: Set the correct default gateway command: ip route add default via 192.168.3.69 ignore_errors: yes
- name: Ensure DNS configuration lineinfile: path: /etc/resolv.conf line: 'nameserver 8.8.8.8' create: yes state: present
- name: Ping google.com to check connectivity ansible.builtin.shell: | ping -c 4 google.com | grep 'time=' || echo "Ping failed" register: ping_test_result changed_when: false failed_when: ping_test_result.rc != 0 or not 'ms' in ping_test_result.stdout
- name: Display ping test result debug: msg: "{{ ping_test_result.stdout }}"

Main installation:

Control_plane and nodes run k3s
I can access the drupal website via 192.168.2.52 (wlan0 of rockpi) after main.yml has finished
all pods seem to be running

root@node1:/home/rock# kubectl get nodes
NAME          STATUS   ROLES                  AGE   VERSION
node5         Ready    <none>                 51m   v1.29.5+k3s1
node3         Ready    <none>                 51m   v1.29.5+k3s1
node4         Ready    <none>                 51m   v1.29.5+k3s1
node1   Ready    control-plane,master   52m   v1.29.5+k3s1
node2         Ready    <none>                 51m   v1.29.5+k3s1

root@node1:/home/rock# kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
nfs-subdir-external-provisioner-7df9c8b467-256mg        1/1     Running   0          50m
cluster-monitoring-prometheus-node-exporter-mm9gw       1/1     Running   0          49m
cluster-monitoring-prometheus-node-exporter-2xm24       1/1     Running   0          49m
cluster-monitoring-prometheus-node-exporter-lf4hn       1/1     Running   0          49m
cluster-monitoring-prometheus-node-exporter-ggg5z       1/1     Running   0          49m
cluster-monitoring-prometheus-node-exporter-h4kps       1/1     Running   0          49m
cluster-monitoring-kube-state-metrics-df8db86bb-zq4lz   1/1     Running   0          49m
cluster-monitoring-kube-pr-operator-b44c59f5d-8qp84     1/1     Running   0          49m
cluster-monitoring-grafana-5b4dd85976-8cv2m             3/3     Running   0          49m
prometheus-cluster-monitoring-kube-pr-prometheus-0      2/2     Running   0          48m

alehanderoo commented 5 months ago

Got it working already! Posting it here for anyone having the same issue.

run kubectl edit svc cluster-monitoring-grafana -n default on control_plane node. This will show the following vi editor.

Change the type to NodePort and add the nodePort port.

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: cluster-monitoring
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-06-04T15:41:14Z"
  labels:
    app.kubernetes.io/instance: cluster-monitoring
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: grafana
    app.kubernetes.io/version: 10.4.1
    helm.sh/chart: grafana-7.3.11
  name: cluster-monitoring-grafana
  namespace: default
  resourceVersion: "14694"
  uid: 2b047274-31cf-413b-8dc2-14b8571a8330
spec:
  clusterIP: 10.43.27.129
  clusterIPs:
  - 10.43.27.129
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-web
    nodePort: 30080 # Optional: specify a port, or leave it to let Kubernetes assign one
    port: 80
    protocol: TCP
    targetPort: 3000
  selector:
    app.kubernetes.io/instance: cluster-monitoring
    app.kubernetes.io/name: grafana
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

run kubectl get svc cluster-monitoring-grafana -n default to validate the settings.

NAME                         TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
cluster-monitoring-grafana   NodePort   10.43.27.129   <none>        80:30080/TCP   4h27m

alehanderoo commented 5 months ago

Does not seem to work after a reboot.

BicycleJohny commented 2 months ago

It is probably because it is handeld by helm. I have the same issue and I am trying to convice helm to do it

BicycleJohny commented 2 months ago

UPDATE: Ok, it wasn't that hard after all. You can either extend this file on tasks[1].values with:

grafana:
    service:
      type: NodePort
      nodePort:30080

and uninstall with helm and reinstall it with ansible. Or you can just uninstall it with helm and put all the values into file like values.yml:

alertmanager:
  enabled: false
grafana:
  service:
    type: NodePort
    nodePort: 30080

And then install it again with helm:

helm install prometheus-stack prometheus-community/kube-prometheus-stack -f values.yaml --kubeconfig /etc/rancher/k3s/k3s.yaml

It then creates Grafana service with type NodePort accessible from specified port

NAME                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
kubernetes                                  ClusterIP   10.43.0.1       <none>        443/TCP             40h
prometheus-operated                         ClusterIP   None            <none>        9090/TCP            13m
prometheus-stack-grafana                    NodePort    10.43.10.202    <none>        80:30080/TCP        13m
prometheus-stack-kube-prom-operator         ClusterIP   10.43.89.109    <none>        443/TCP             13m
prometheus-stack-kube-prom-prometheus       ClusterIP   10.43.61.6      <none>        9090/TCP,8080/TCP   13m
prometheus-stack-kube-state-metrics         ClusterIP   10.43.165.23    <none>        8080/TCP            13m
prometheus-stack-prometheus-node-exporter   ClusterIP   10.43.155.255   <none>        9100/TCP            13m

BicycleJohny commented 2 months ago

Btw @alehanderoo, kubectl port-forward is temporary thing and it should wait for termination from user - it creates temporary forward rule and waits until you are finish (ctrl+c). That is why it looks like it hangs

geerlingguy / pi-cluster

port-forward for grafana hangs indefinitely: Grafana webpage not accessible #15

See: https://github.com/geerlingguy/turing-pi-2-cluster/issues/9