standard dashboard work incompletly on rke2 with cilium

didlawowo commented 5 months ago

Describe the bug

i'm using helm with vm K8s stack chart grafana come with dashboard but some are not working correctly

To Reproduce

just install stack k8s

Version

latest

Logs

No response

Screenshots

No response

Used command-line flags

No response

Additional information

No response

dmitryk-dk commented 5 months ago

Hi @didlawowo ! What of the dashboard are you using? VictoriaMetrics has its dashboards, and you can find them here.

dmitryk-dk commented 5 months ago

If you want to use this dashboard you should check the metrics which are used in that dashboard and probably correct them

didlawowo commented 5 months ago

i'm using dashboard provided by the k8s vm stack

dmitryk-dk commented 5 months ago

i'm using dashboard provided by the k8s vm stack

In the k8s stack, VictoriaMetrics exposes the dashboards that I shared before. As far as I can see from the domain you are using tailscale. So I think you should check which stack you are using

dmitryk-dk commented 5 months ago

Hi @didlawowo ! I reproduces your bug, need to check how to fix it

dmitryk-dk commented 5 months ago

Hi @didlawowo ! Can you check the vmagent on your installation? It will show you where is the problem with scrapes targets. If you fix it you should see all information into your dashboards.

didlawowo commented 5 months ago

thx you but could be more specific ? i'm not sure to understand

didlawowo commented 5 months ago

i'm using dashboard provided by the k8s vm stack

In the k8s stack, VictoriaMetrics exposes the dashboards that I shared before. As far as I can see from the domain you are using tailscale. So I think you should check which stack you are using

the tailscale service its just for exposing. no impact

dmitryk-dk commented 5 months ago

thx you but could be more specific ? i'm not sure to understand

Hi! We found a bug, and the dashboard should be updated. It happens because some kubernetes setup may missing image or container label https://github.com/dotdc/grafana-dashboards-kubernetes/issues/18#issuecomment-1218059507

As a small workaround you can install you kubelet with next configuration and check what the panels will have no data.

kubelet:
  spec:
    # drop high cardinality label and useless metrics for cadvisor and kubelet
    metricRelabelConfigs:
      - action: labeldrop
        regex: (uid)
      - action: labeldrop
        regex: (id|name)
      - action: drop
        source_labels: [__name__]
        regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
      - target_label: image
        replacement: placeholder

dmitryk-dk commented 5 months ago

@Haleygo or @zekker6, can you take a look into the issue, please?

didlawowo commented 5 months ago

nice answer, i'm not sure how to config kubelet in rke2

https://docs.rke2.io/reference/windows_agent_config?_highlight=kubelet&_highlight=conf#windows-rke2-agent-cli-help

i take a look

AndrewChubatiuk commented 5 months ago

hey @didlawowo what kubernetes version are you on? what args are you passing to rke2 agent now?

didlawowo commented 5 months ago

i'm using rke2

with these parameters

write-kubeconfig-mode: "0600"
server: https://192.168.1.200:9345
token: 
tls-san:
  - "192.168.1.200"
# Make a etcd snapshot every 6 hours
etcd-snapshot-schedule-cron: "0 */6 * * *"
# Keep 56 etcd snapshorts (equals to 2 weeks with 6 a day)
etcd-snapshot-retention: 56
etcd-expose-metrics: true
cni:
  - cilium
disable:
  - rke2-ingress-nginx
  - rke2-canal
  - rke2-kube-proxy
disable-cloud-controller: true
disable-kube-proxy: true

v1.27.12+rke2r1

AndrewChubatiuk commented 4 hours ago

hey @didlawowo finally found time to test a case in this issue locally as we are not using RKE2 at all was able to reproduce issues with scraping kube-scheduler, kube-controller-manager and etcd metrics. All these services required additional configurations to become scrapable by vmagent

In /etc/rancher/rke2/config.yaml had to add several values

etcd-expose-metrics: true
kube-scheduler-arg:
- bind-address=0.0.0.0               # haven't checked how to pass there address from pod metadata instead
kube-controller-manager-arg:
- bind-address=0.0.0.0               # haven't checked how to pass there address from pod metadata instead

additional values for k8s-stack

kubeControllerManager:
vmScrape:
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      port: http-metrics
      scheme: https
      tlsConfig:
        caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        serverName: localhost          # maybe I've misconfigured something, but there was an issue until this value was set
        insecureSkipVerify: true        # haven't tried to pass automatically generated certificates in agent nodes
kubeEtcd:
service:
port: 2381
targetPort: 2381
vmScrape:
spec:
  endpoints:  
    - port: http-metrics
      scheme: http
kubeScheduler:
vmScrape:
spec:
  endpoints:
    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      tlsConfig:
        caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        serverName: 127.0.0.1
        insecureSkipVerify: true                # haven't tried to pass automatically generated certificates in agent nodes

VictoriaMetrics / helm-charts