fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.82k stars 1.58k forks source link

docker json logs using the "Decode_Field json" feature cannot be indexed through elasticsearch #534

Closed sestrel closed 6 years ago

sestrel commented 6 years ago

I've read the documentation rearding the nested JSON support you are currently implementing for 0.13 in form of parser decoders: http://fluentbit.io/documentation/0.13/parser/decoder.html

You describe a typical use case in form of a containerized environments with Docker where the application logs data in JSON format which becomes an escaped string.

When using the "Decode_Field json log" there is one problem here, everything written to sysout/syserr is being written to disk by the docker json-file log driver which is being used to accomplish the described use case. But, not every line written to sysout/syserr is a well formatted JSON message. This leads to:

So why is this a problem? We do all that work because we want to collect all those logs and make em searchable through elasticsearch or something else. But the type of a json attribute has to stay the same for being indexable with elasticsearch. It has to stay either a string or an object.

Are there any plans for a remapping support of the decoded json content? The kubernetes filter for example merges all fields into the parent locument. Here we could do something similiar.

jbrownsc commented 6 years ago

Just to piggyback on this.

Using a docker image: fluent/fluent-bit-0.13-dev:0.10 in Kubernetes with the following config

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name             tail
    Path             /tmp/var/log/containers/test-fluent-69d87bd8bc-65z8d_test-namespace_test-fluent-019f9c0c33d24470d943a399a2defb63b9bbed982b232941043963475067c315.log
    Tag              kube.*
    Parser           docker
    Refresh_Interval 5
    Mem_Buf_Limit    5MB
    Skip_Long_Lines  On

[FILTER]
    Name   kubernetes
    Match  kube.*
    Merge_JSON_Log On

[OUTPUT]
    Name  es
    Match *
    Host  [Elastic Search host]
    Port  9200
    Retry_Limit False
    Generate_ID True

And the following log messages

{"log":"{\"json_string_key\": \"json_string_value\"}\n","stream":"stdout","time":"2018-03-09T06:17:20.701547477Z"}
{"log":"STRING\n","stream":"stdout","time":"2018-03-09T06:17:22.104768562Z"}

The log message from Line 1 will be indexed as mapped object with nested properties.

(k8s-uw1-gcp/default) ➜  /tmp  curl -sX GET http://localhost:9200/fluent-bit/ | jq -r '."fluent-bit".mappings.flb_type.properties.log'
{
  "properties": {
    "json_string_key": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

The log message from Line 2 will error out as it is cast as a string, and a string can't be injected into a mapped object.

[2018/03/12 22:29:20] [ warn] [out_es] Elasticsearch error
{"took":43,"errors":true,"items":[{"index":{"_index":"fluent-bit","_type":"flb_type","_id":"2b524b83-f7a4-366d-b0c2-ca9ffed44173","_version":4,"result":"updated","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":3,"_primary_term":1,"status":200}},{"index":{"_index":"fluent-bit","_type":"flb_type","_id":"498f7d28-1630-9083-d1e2-85a66258540f","status":400,"error":{"type":"mapper_parsing_exception","reason":"object mapping for [log] tried to parse field [log] as object, but found a concrete value"}}}]}

Note If you happen to have a fresh index, you MAY get the opposite behavior, where Line 2 will be indexed first and cause the "log" field type to be "text", in which case mapped objects will fail with an error like Can't get text on a START_OBJECT at 1:49

I presume that the log field should always be text, and that the mapped object should be merged into the top level data structure, as the documentation describes.

edsiper commented 6 years ago

@sestrel @jbrownsc

thanks for troubleshooting the issue. So the main thing to fix is to provide a proper logic and functionality that do not mess Elasticsearch index, or any storage that relies on data types.

I am thinking about this change:

note: not sure yet about the right naming.

So for Kubernetes case Decode_Field should be enough since the main "log string" will keep the same

edsiper commented 6 years ago

@sestrel @jbrownsc

I've addressed the original issue described on top. Thanks for helping to troubleshoot the issue, the new solution is to avoid to mess the 'log' value with both unstructured/structured content. For Kubernetes please refer to the newest 0.13-dev:0.13 image and Yaml files:

https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev

You will see now that the Docker parser have the following definition:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field   | Optional Action
    # =============|==========|=========|=================
    Decode_Field_As   escaped    log

filter_kubernetes have been updated to deal with different actions taken by the parser decoders, so we should be OK now.

Please test the 0.13-dev:0.13 image and confirm back the new status.

edsiper commented 6 years ago

Closing as fixed. If you still face the issue we can re open this ticket.

sestrel commented 6 years ago

I tested the changed behaviour for the decoders documented here:http://fluentbit.io/documentation/0.13/parser/decoder.html The decoder type Decode_Field keeps the original message and merges the content of a nested json and the original message as you describe. Escaped special characters like linebreaks are being restored for nested content. Once 0.13 is out I'm not forced to use the kubernetes filter anymore for supporting nested json content. Are there any plans for a Merge_Json_Key remapping feature like the one supported in the kubernetes filter?

I don't see any reason when or why the newly introduced decoder type Decode_Field_As will been used for or what the do_next feature stands for. It's kinda hard to unterderstand based on the provided documentation. Could you provide some more examples maybe?

gabe-sorensen commented 6 years ago

@edsiper I'm working on testing out the 0.13-dev:0.18 image, and I'm running into a few issues with the parsing, namely the handling of the characters '<', '>', and '&'. I'm not sure if there's other cases, since that's all I've seen in my logs. Those characters get parsed to 'u003c', 'u003e', and 'u0026', respectively in the output.

Old version (before switching to 0.13-dev:0.18. Notice \n is not removed) 2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n

New version (with 0.13-dev:0.18. \n is correctly removed, but the > character has been modified) 2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -u003e Retry(total=3, connect=None, read=None, redirect=None, status=None)

Old "correct" record:

{
  "_index": "logstash-2018.04.23",
  "_type": "fluentd",
  "_id": "AWL0NlCOJkRsMcLZl-SA",
  "_score": null,
  "_source": {
    "log": "2018-04-23 20:33:37,656 [TEST] [   DEBUG] [urllib3.util.retry] [pid:17   ] [    MainThread] Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n",
    "stream": "stderr",
    "docker": {
      "container_id": "92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b"
    },
    "kubernetes": {
      "container_name": "test-container",
      "namespace_name": "qa1-ns",
      "pod_name": "test-webserver-847755c865-rz8x5",
      "pod_id": "fef4a613-4732-11e8-8cbb-06712f6f161e",
      "labels": {
        "app": "test-webserver",
        "pod-template-hash": "4033117421",
        "release": "qa1-test"
      },
      "host": "ip-10-100-22-172.us-west-2.compute.internal",
      "master_url": "https://100.64.0.1:443/api",
      "namespace_id": "cb57e0e3-38e1-11e8-9d82-06e3398080f4"
    },
    "@timestamp": "2018-04-23T20:33:37+00:00",
    "tag": "kubernetes.var.log.containers.test-webserver-847755c865-rz8x5_qa1-ns_test-container-92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b.log"
  },
  "fields": {
    "@timestamp": [
      1524515617000
    ]
  },
  "highlight": {
    "log": [
      "2018-04-23 20:33:37,656 [TEST] [   DEBUG] [urllib3.util.retry] [pid:17   ] [    MainThread] @kibana-highlighted-field@Converted@/kibana-highlighted-field@ @kibana-highlighted-field@retries@/kibana-highlighted-field@ @kibana-highlighted-field@value@/kibana-highlighted-field@: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n"
    ],
    "kubernetes.labels.release": [
      "@kibana-highlighted-field@qa1@/kibana-highlighted-field@-@kibana-highlighted-field@test@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1524515617000
  ]
}

New record:

{
  "_index": "logstash-2018.04.23",
  "_type": "flb_type",
  "_id": "AWL0P8cFIevz06KnfN3W",
  "_score": null,
  "_source": {
    "@timestamp": "2018-04-23T20:33:37.656Z",
    "log": "2018-04-23 20:33:37,656 [TEST] [   DEBUG] [urllib3.util.retry] [pid:17   ] [    MainThread] Converted retries value: 3 -u003e Retry(total=3, connect=None, read=None, redirect=None, status=None)",
    "stream": "stderr",
    "time": "2018-04-23T20:33:37.656915306Z",
    "kubernetes": {
      "pod_name": "test-webserver-847755c865-rz8x5",
      "namespace_name": "qa1-ns",
      "pod_id": "fef4a613-4732-11e8-8cbb-06712f6f161e",
      "labels": {
        "app": "test-webserver",
        "pod-template-hash": "4033117421",
        "release": "qa1-test"
      },
      "annotations": {
        "iam_amazonaws_com/role": "test",
        "kubernetes_io/created-by": "{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"qa1-ns\\\",\\\"name\\\":\\\"test-webserver-847755c865\\\",\\\"uid\\\":\\\"64d461b0-449a-11e8-9d82-06e3398080f4\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"9114439\\\"}}\\n",
        "kubernetes_io/limit-ranger": "LimitRanger plugin set: cpu, memory limit for container test-container"
      },
      "host": "ip-10-100-22-172.us-west-2.compute.internal",
      "container_name": "test-container",
      "docker_id": "92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b"
    }
  },
  "fields": {
    "@timestamp": [
      1524515617656
    ],
    "time": [
      1524515617656
    ]
  },
  "sort": [
    1524515617656
  ]
}

Configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Merge_Log           On
        K8S-Logging.Parser  On

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Logstash_Prefix logstash
        TLS             On
        Retry_Limit     False

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped    log

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

daemonset:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit-0.13-dev:0.18
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch_url.example.com"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "443"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

Edit: created https://github.com/fluent/fluent-bit/issues/580 to track this issue, since it appears to be a separate issue