camilb / prometheus-kubernetes

Monitoring Kubernetes clusters on AWS, GCP and Azure using Prometheus Operator and Grafana
Apache License 2.0
670 stars 299 forks source link

Custom configuration gets overwritten #76

Open squed opened 6 years ago

squed commented 6 years ago

I've been attempting to deploy prometheus federation using a custom configuration but think I am not understanding something fully.

I have three clusters with external url configured as lab1, 2 and 3

https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web

Everything works great for the individual clusters but then I try to configure lab1 as the federation server but the configuration never appears to take...

Steps to recreate after successful deployment and testing external urls:

kubectl -n monitoring delete prometheus k8s

I edit tools/custom-configuration/prometheus-k8s-secret.prometheus.yaml and add the following underneath scrape_configs:

- job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'https://api.lab2.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'
        - 'https://api.lab3.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web'

This seems fine and when i deploy the secret I can decode the base64 and see that it is correct kubectl -n monitoring create secret generic prometheus-k8s --from-file=./prometheus-k8s-secret/

However when I deploy new prometheus kubectl -n monitoring create -f prometheus-k8s.yaml it overwrites the prometheus.yaml in the secret (I decoded the base64 as i deployed this and see it is immediately overwritten)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  labels:
    prometheus: k8s
spec:
  replicas: 2
  version: v2.2.0-rc.1
  externalUrl: 
   https://api.lab1.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  serviceAccountName: prometheus-k8s
  serviceMonitorSelector: {}
  ruleSelector:
    matchLabels:
      role: prometheus-rulefiles
      prometheus: k8s
  resources:
  storage:
    volumeClaimTemplate:
      metadata:
        annotations:
          annotation1: prometheus
      spec:
        storageClassName: ssd
        resources:
          requests:
            storage: 40Gi
    requests:
      memory: 1Gi
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-main
      port: web

What am I missing here? I've tried everything I can think of including scorched earth of re-creating the cluster and re-cloning the repo.

camilb commented 6 years ago

@squed You need a ServiceMonitor to configure jobs. The secrets are generated from ServiceMonitors.

This should work, only make sure you expose the other 2 prometheus to be accessible from an IP address like 1.2.3.4:9090

kind: Service
apiVersion: v1
metadata:
  name: lab2
  namespace: monitoring
  labels:
    k8s-app: lab2
spec:
  externalName: 1.2.3.4 #the ip of api.lab2.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  type: ExternalName
  ports:
  - name: http2
    port: 9090
    protocol: TCP
    targetPort: 9090
---
kind: Service
apiVersion: v1
metadata:
  name: lab3
  namespace: monitoring
  labels:
    k8s-app: lab3
spec:
  externalName: 1.2.3.4 #the ip of api.lab3.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  type: ExternalName
  ports:
  - name: http3
    port: 9090
    protocol: TCP
    targetPort: 9090
---
apiVersion: v1
kind: Endpoints
metadata:
  name: lab2
  namespace: monitoring
  labels:
    k8s-app: lab2
subsets:
- addresses:
  - ip: 1.2.3.4 #the ip of api.lab2.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  ports:
  - name: http2
    port: 9090
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: lab3
  namespace: monitoring
  labels:
    k8s-app: lab3
subsets:
- addresses:
  - ip: 1.2.3.4 #the ip of api.lab3.domain.com/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web
  ports:
  - name: http3
    port: 9090
    protocol: TCP
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus
  name: prometheus-federation-2
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    port: http2
    path: /federate
    honorLabels: true
  jobLabel: prometheus-federation-2
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      k8s-app: lab2
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus
  name: prometheus-federation-3
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    port: http3
    path: /federate
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    honorLabels: true
  jobLabel: prometheus-federation-3
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      k8s-app: lab3
squed commented 6 years ago

@camilb thanks for the quick response.

I think I need to do some more reading around ServiceMonitors as despite creating ./manifests/prometheus/prometheus-k8s-service-monitor-federated.yaml with your suggestion i'm not seeing any change in the prometheus configuration or any federated targets.

squed commented 6 years ago

I'm not sure if I explained correctly but the targets are completely separate k8s clusters and not services running within one cluster.

I am sending a request to the k8s api which then forwards the request through kube-proxy to the service endpoint.

camilb commented 6 years ago

@squed Can you expose those targets using a ingress, nodeport or loadbalancer? Have no idea how to make them work in this particular case over kube-proxy.

camilb commented 6 years ago

@squed Found a better solution recently, you might want to check it out: https://github.com/coreos/prometheus-operator/pull/1100

StevenACoffman commented 6 years ago

@squed These are also relevant: