fluent / fluent-plugin-prometheus

A fluent plugin that collects metrics and exposes for Prometheus.
Apache License 2.0
258 stars 79 forks source link

[Kubernetes] Create new Prometheus-friendly metric from Fluentd pod logs #117

Closed manicole closed 4 years ago

manicole commented 5 years ago

In Kubernetes, I'd like to create a new metric @type prometheus parsed from Fluentd pod logs.

Expected Behavior

  1. Gather logs from Fluentd pod deployed in namespace opa: kubectl logs <fluentd-pod> -n opa

  2. Parse and edit value via a ConfigMap to constitute metrics. These metrics are formerly displayed at service endpoint, i.e. curl https://<fluentd-service-IP>:<fluentd-service-port>/metrics

Actual Behavior

After deploying Fluentd container in a pod (as a sidecar of OPA), installing fluent-plugin-prometheus in this container, deploying the custom configMap (default endpoint configuration),

curl https://<fluentd-service-IP>:<fluentd-service-port>/metrics shows nothing. I can't find out what I'm missing ...

Steps to Reproduce the Problem

  1. Install Fluentd in a pod as a sidecar container of an OPA container in namespace opa :

        - name: fluentd
          image: fluent/fluentd
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          env:
          - name: FLUENT_UID
            value: "0"
          volumeMounts:
          - name: varlog
            mountPath: /var/log
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
            readOnly: true
          - name: fluentd-opa
            mountPath: /fluentd/etc/
  2. Install fluent-plugin-prometheus in Fluentd container :

    kubectl exec -it <pod-name> -n opa -c fluentd /bin/sh
    /  # gem install fluent-plugin-prometheus
  3. Create configMap containing Fluentd configuration in opa namespace

    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: fluentd-opa
    namespace: opa
    data:
    fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>
    
    <filter kubernetes.var.log.containers.**opa**.log>
      @type prometheus
      <metric>
        name resp_status_counter
        type counter
        desc bla
        key $.log.resp_status
     </metric>
    </filter>
    
    <match kubernetes.var.log.containers.**opa**.log>
      @type prometheus
    </match>

    💡 Key is $.log.resp_status because the Fluentd logs to parse (which actually are OPA forwarded logs) are in the form

    {
    "log":"{
        "client_addr":"10.233.64.1:38700",
        "level":"info",
        "msg":"Sent response.",
        "req_id":12697,
        "req_method":"POST",
        "req_path":"/",
        "resp_body":"{
            "apiVersion":"admission.k8s.io/v1beta1",
            "kind":"AdmissionReview";
            "response":{
                "allowed":true
            }
        }",
        "resp_bytes":94,
        "resp_duration":3.421389,
        "resp_status":200,
        "time":"2019-09-20T13:40:11Z"
    }",
    "stream":"stderr"
    }
  4. Curl Fluentd service endpoint to see supposingly newly created metric : curl https://<fluentd-service-IP>:<fluentd-service-port>/metrics (shows nothing).

Additional Info

Kubernetes v1.14 Fluent-plugin-prometheus v1.6.0 Fluentd v1.3.2 Prometheus-client v0.9.0

Ideas to solve the issue

gives, among other informations,

    openpolicyagent.org/policy-status: '{"status":"error","error":{"code":"invalid_parameter","message":"error(s)
      occurred while compiling module(s)","errors":[{"code":"rego_parse_error","message":"no
      match found","location":{"file":"opa/fluentd-opa/fluent.conf","row":14,"col":1},"details":{}}]}}'

Any help appreciated ! Thanks

manicole commented 5 years ago

Solutions found

After investigating again and again, here is what I understood (if it can help anyone). If anything is wrong, please point it.

  1. The OPA error displayed when getting the ConfigMap (see above) does not seem to impact : the configuration is well taken in account after deploying the ConfigMap and redeploying the OPA + sidecars.

  2. In Fluentd conf, you usually specify the source within the <source> tags, the logs filtering within the <filter> tags and the output within the ... (there is a trick) ... <match> tags.

Using fluent-plugin-prometheus, tags and types to use change.

Here is an example :

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-opa
  namespace: opa
data:
  fluent.conf: |

    # get logs from /var/log/containers/
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>

    # filter plugin for prometheus type
    # instrument metrics from records
    # no impact against values of each records
    <filter kubernetes.var.log.containers.**opa**.log>
      @type prometheus
      <metric>
        name log_counter
        type counter
        desc The total number of logs
        key $.log
      </metric>
    </filter>

    # output plugin for prometheus type
    <match kubernetes.var.log.containers.**opa**.log>
      @type copy
      <store>
        @type prometheus
        <metric>
          name log_counter
          type counter
          desc The total number of logs
        </metric>
      </store>
    </match>

    # provides a metrics HTTP endpoint to be scraped by a Prometheus server
    # expose custom and default on container localhost
    <source>
      @type prometheus
      bind 0.0.0.0
      port 24224
      metrics_path /metrics
    </source>

Remaining questions

Thank you !

manicole commented 4 years ago

Further investigations later ... I came up with this configuration, if of any help to anyone :

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-opa
  namespace: opa
data:
  fluent.conf: |

    # Get logs from /var/log/containers/
    <source>
      @type tail
      path /var/log/containers/*.log
      format json
      read_from_head false
      tag kubernetes.*
    </source>

    # Parse log entry to put everything under level "log" on top
    <filter kubernetes.var.log.containers.**opa_opa**.log>
      @type parser
      key_name log
      reserve_data false
      remove_key_name_field true
      ignore_key_not_exist true
      suppress_parse_error_log true
      <parse>
        @type json
      </parse>
    </filter>

    # Keep decision log only
    <filter kubernetes.var.log.containers.**opa_opa**.log>
      @type grep
      <and>
        <regexp>
          key $.req_method
          pattern /POST/
        </regexp>
        <regexp>
          key $.req_path
          pattern /\//
        </regexp>
        <regexp>
          key $.resp_status
          pattern \d+
        </regexp>
      </and>
    </filter>

    # Parse resp_body entry to put everything under level "resp_body" on top
    <filter kubernetes.var.log.containers.**opa_opa**.log>
      @type parser
      key_name resp_body
      reserve_data true
      remove_key_name_field true
      ignore_key_not_exist true
      <parse>
        @type json
      </parse>
    </filter>

    <filter kubernetes.var.log.containers.**opa_opa**.log>
      @type prometheus
      <metric>
        name opa_decisions_total
        type counter
        desc The total number of OPA decisions.
        # No key means increment counter for each record
      </metric>
      <metric>
        name opa_decisions_duration
        type summary
        desc The total number of OPA decisions
        key $.resp_duration
      </metric>
      <labels>
        status $.resp_status
      </labels>
    </filter>

    <match kubernetes.var.log.containers.**opa_opa**.log>
      @type copy
      <store>
        @type prometheus
        <metric>
          name opa_decisions_total
          type counter
          desc The total number of OPA decisions.
          # No key means increment counter for each record
        </metric>
        <metric>
          name opa_decisions_duration
          type summary
          desc The total number of OPA decisions
          key $.resp_duration
        </metric>
      </store>
      <store>
        @type stdout
      </store>
    </match>

    <source>
      @type prometheus
      bind 0.0.0.0
      port 24224
      metrics_path /metrics
    </source>

which gave me the following metrics : image