aidapsibr / aks-prometheus-windows-exporter

A set of scripts and deployments to scrape Windows metrics from Windows AKS Node Pools
MIT License
4 stars 12 forks source link

windows-prometheus-sync pod not listening on port 80 #2

Open danfinn opened 2 years ago

danfinn commented 2 years ago

I was able to install the windows-prometheus-sync helm chart from the README however it does not appear to be functioning. It's not listening on port 80.

kubectl exec -it windows-prometheus-sync-5df6cc7c67-wwd2m sh
/app # curl localhost
curl: (7) Failed to connect to localhost port 80 after 0 ms: Connection refused

and looking at the logs for the pod you can see that it's not happy:

kubectl logs -f windows-prometheus-sync-5df6cc7c67-wwd2m
dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000
warn: WindowsPrometheusSync.KubernetesClientWrapper[0]
      Unable to access scrape config secret
      Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Forbidden'
         at k8s.Kubernetes.ReadNamespacedSecretWithHttpMessagesAsync(String name, String namespaceParameter, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationToken)
         at k8s.KubernetesExtensions.ReadNamespacedSecretAsync(IKubernetes operations, String name, String namespaceParameter, String pretty, CancellationToken cancellationToken)
         at WindowsPrometheusSync.KubernetesClientWrapper.GetPrometheusScrapeConfigSecretAsync(CancellationToken cancellationToken) in /src/WindowsPrometheusSync/IKubernetesClientWrapper.cs:line 98
crit: WindowsPrometheusSync.SyncService[0]
      Unexpected exception in LoopSyncAsync
      System.NullReferenceException: Object reference not set to an instance of an object.
         at WindowsPrometheusSync.SyncService.SyncWithWindowsNodesAsync(CancellationToken cancellationToken) in /src/WindowsPrometheusSync/SyncService.cs:line 122
         at WindowsPrometheusSync.SyncService.LoopSyncAsync(CancellationToken cancellationToken) in /src/WindowsPrometheusSync/SyncService.cs:line 96

I do see that the helm chart created a secret named windows-prometheus-sync-token-z4n6g.

kubectl describe secret windows-prometheus-sync-token-z4n6g
Name:         windows-prometheus-sync-token-z4n6g
Namespace:    monitoring
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: windows-prometheus-sync
              kubernetes.io/service-account.uid: df69ec56-8a99-4db3-b43c-d6b077067dd9

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1761 bytes
namespace:  10 bytes
token:      xxxx$tokenxxx
danfinn commented 2 years ago

Still trying to make sense of this and it looks like there is either a missing secret or a naming issue. The role that the helm chart creates is looking for a secret that does not exist:

kubectl describe role windows-prometheus-sync-secret-modifier
Name:         windows-prometheus-sync-secret-modifier
Labels:       app.kubernetes.io/instance=windows-prometheus-sync
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=windows-prometheus-sync
              app.kubernetes.io/version=0.1.0
              helm.sh/chart=windows-prometheus-sync-0.1.0
Annotations:  meta.helm.sh/release-name: windows-prometheus-sync
              meta.helm.sh/release-namespace: monitoring
PolicyRule:
  Resources  Non-Resource URLs  Resource Names                                   Verbs
  ---------  -----------------  --------------                                   -----
  secrets    []                 [kube-prometheus-stack-prometheus-scrape-confg]  [get watch list update patch]

There is no secret in any namespace with the name kube-prometheus-stack-prometheus-scrape-confg (also might be a typo there, should confg be config?)

This is the secret that the code above is looking for and cannot find:

private const string SecretName = "kube-prometheus-stack-prometheus-scrape-confg";
aidapsibr commented 2 years ago

This should be additional Prometheus scrape targets. I recommend creating a new one and adding it to Prometheus as is show here or however you run it.

https://github.com/prometheus-operator/prometheus-operator/tree/main/example/additional-scrape-configs

aidapsibr commented 2 years ago
apiVersion: v1
data:
  additional-scrape-configs.yaml: 
kind: Secret
metadata:
  name: kube-prometheus-stack-prometheus-scrape-confg
  namespace: monitoring

As an example, based on the code and tests, providing a blank yaml in the secret should be fine.

Edit: Change the namespace to wherever prometheus lives so it can read it.

danfinn commented 2 years ago

I was able to create an empty secret as you suggested however I had to change it just slightly to be able to create it, I had to make the secret data an empty set of double quotes:

apiVersion: v1
data:
  additional-scrape-configs.yaml: ""
kind: Secret
metadata:
  name: kube-prometheus-stack-prometheus-scrape-confg
  namespace: monitoring

I then killed off the windows-prometheus-sync pod and a new one was created and the logs look much better this time around:

info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /app
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000
dbug: WindowsPrometheusSync.SyncService[0]
      Discovered windows nodes:
        akswin3000000

However as far as I can tell nothing is ever registered with prometheus and none of the metrics that are provides from the exporter on the node ever show up in prometheus. They should all be named windows_ and nothing like that is getting added.

danfinn commented 2 years ago

I was able to get this working but it was far less straightforward than the documentation would indicate.

Prometheus was not picking up the additional scrape config as I had it originally installed, which was via helm from the community charts. It seems as though not every install of prometheus supports the additional scrape configs.

What I had to do was install from bitnami using the following: helm install prometheus bitnami/kube-prometheus --set node-exporter.nodeSelector."kubernetes\\.io/os"=linux --set operator.nodeSelector."kubernetes\\.io/os"=linux --set operator.nodeSelector."kubernetes\\.io/arch"=amd64 --set prometheus.nodeSelector."kubernetes\\.io/os"=linux --set prometheus.nodeSelector."kubernetes\\.io/arch"=amd64 --set alertmanager.nodeSelector."kubernetes\\.io/os"=linux --set alertmanager.nodeSelector."kubernetes\\.io/arch"=amd64 -f /tmp/values.yaml

and that values.yaml file:

prometheus:
  additionalScrapeConfigs:
    enabled: true
    type: external
    external:
      name: kube-prometheus-stack-prometheus-scrape-confg
      key: additional-scrape-configs.yaml

Once that was in place prometheus picked it up and registered the metrics from the windows node.

I'm now dealing with an issue where additional nodes that are created in the same scale set do not seem to get the windows exporter installed. They get registered with prometheus but prometheus can't talk to them on port 9100 because windows exporter did not get installed. This should happen on it's own, shouldn't it?