Closed sestrel closed 6 years ago
Just to piggyback on this.
Using a docker image: fluent/fluent-bit-0.13-dev:0.10 in Kubernetes with the following config
[SERVICE]
Flush 1
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name tail
Path /tmp/var/log/containers/test-fluent-69d87bd8bc-65z8d_test-namespace_test-fluent-019f9c0c33d24470d943a399a2defb63b9bbed982b232941043963475067c315.log
Tag kube.*
Parser docker
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Merge_JSON_Log On
[OUTPUT]
Name es
Match *
Host [Elastic Search host]
Port 9200
Retry_Limit False
Generate_ID True
And the following log messages
{"log":"{\"json_string_key\": \"json_string_value\"}\n","stream":"stdout","time":"2018-03-09T06:17:20.701547477Z"}
{"log":"STRING\n","stream":"stdout","time":"2018-03-09T06:17:22.104768562Z"}
The log message from Line 1 will be indexed as mapped object with nested properties.
(k8s-uw1-gcp/default) ➜ /tmp curl -sX GET http://localhost:9200/fluent-bit/ | jq -r '."fluent-bit".mappings.flb_type.properties.log'
{
"properties": {
"json_string_key": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
The log message from Line 2 will error out as it is cast as a string, and a string can't be injected into a mapped object.
[2018/03/12 22:29:20] [ warn] [out_es] Elasticsearch error
{"took":43,"errors":true,"items":[{"index":{"_index":"fluent-bit","_type":"flb_type","_id":"2b524b83-f7a4-366d-b0c2-ca9ffed44173","_version":4,"result":"updated","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":3,"_primary_term":1,"status":200}},{"index":{"_index":"fluent-bit","_type":"flb_type","_id":"498f7d28-1630-9083-d1e2-85a66258540f","status":400,"error":{"type":"mapper_parsing_exception","reason":"object mapping for [log] tried to parse field [log] as object, but found a concrete value"}}}]}
Note If you happen to have a fresh index, you MAY get the opposite behavior, where Line 2 will be indexed first and cause the "log" field type to be "text", in which case mapped objects will fail with an error like Can't get text on a START_OBJECT at 1:49
I presume that the log field should always be text, and that the mapped object should be merged into the top level data structure, as the documentation describes.
@sestrel @jbrownsc
thanks for troubleshooting the issue. So the main thing to fix is to provide a proper logic and functionality that do not mess Elasticsearch index, or any storage that relies on data types.
I am thinking about this change:
note: not sure yet about the right naming.
So for Kubernetes case Decode_Field should be enough since the main "log string" will keep the same
@sestrel @jbrownsc
I've addressed the original issue described on top. Thanks for helping to troubleshoot the issue, the new solution is to avoid to mess the 'log' value with both unstructured/structured content. For Kubernetes please refer to the newest 0.13-dev:0.13 image and Yaml files:
https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev
You will see now that the Docker parser have the following definition:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==========|=========|=================
Decode_Field_As escaped log
filter_kubernetes have been updated to deal with different actions taken by the parser decoders, so we should be OK now.
Please test the 0.13-dev:0.13 image and confirm back the new status.
Closing as fixed. If you still face the issue we can re open this ticket.
I tested the changed behaviour for the decoders documented here:http://fluentbit.io/documentation/0.13/parser/decoder.html The decoder type Decode_Field keeps the original message and merges the content of a nested json and the original message as you describe. Escaped special characters like linebreaks are being restored for nested content. Once 0.13 is out I'm not forced to use the kubernetes filter anymore for supporting nested json content. Are there any plans for a Merge_Json_Key remapping feature like the one supported in the kubernetes filter?
I don't see any reason when or why the newly introduced decoder type Decode_Field_As will been used for or what the do_next feature stands for. It's kinda hard to unterderstand based on the provided documentation. Could you provide some more examples maybe?
@edsiper I'm working on testing out the 0.13-dev:0.18 image, and I'm running into a few issues with the parsing, namely the handling of the characters '<', '>', and '&'. I'm not sure if there's other cases, since that's all I've seen in my logs. Those characters get parsed to 'u003c', 'u003e', and 'u0026', respectively in the output.
Old version (before switching to 0.13-dev:0.18. Notice \n is not removed) 2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n
New version (with 0.13-dev:0.18. \n is correctly removed, but the > character has been modified) 2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -u003e Retry(total=3, connect=None, read=None, redirect=None, status=None)
Old "correct" record:
{
"_index": "logstash-2018.04.23",
"_type": "fluentd",
"_id": "AWL0NlCOJkRsMcLZl-SA",
"_score": null,
"_source": {
"log": "2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n",
"stream": "stderr",
"docker": {
"container_id": "92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b"
},
"kubernetes": {
"container_name": "test-container",
"namespace_name": "qa1-ns",
"pod_name": "test-webserver-847755c865-rz8x5",
"pod_id": "fef4a613-4732-11e8-8cbb-06712f6f161e",
"labels": {
"app": "test-webserver",
"pod-template-hash": "4033117421",
"release": "qa1-test"
},
"host": "ip-10-100-22-172.us-west-2.compute.internal",
"master_url": "https://100.64.0.1:443/api",
"namespace_id": "cb57e0e3-38e1-11e8-9d82-06e3398080f4"
},
"@timestamp": "2018-04-23T20:33:37+00:00",
"tag": "kubernetes.var.log.containers.test-webserver-847755c865-rz8x5_qa1-ns_test-container-92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b.log"
},
"fields": {
"@timestamp": [
1524515617000
]
},
"highlight": {
"log": [
"2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] @kibana-highlighted-field@Converted@/kibana-highlighted-field@ @kibana-highlighted-field@retries@/kibana-highlighted-field@ @kibana-highlighted-field@value@/kibana-highlighted-field@: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)\n"
],
"kubernetes.labels.release": [
"@kibana-highlighted-field@qa1@/kibana-highlighted-field@-@kibana-highlighted-field@test@/kibana-highlighted-field@"
]
},
"sort": [
1524515617000
]
}
New record:
{
"_index": "logstash-2018.04.23",
"_type": "flb_type",
"_id": "AWL0P8cFIevz06KnfN3W",
"_score": null,
"_source": {
"@timestamp": "2018-04-23T20:33:37.656Z",
"log": "2018-04-23 20:33:37,656 [TEST] [ DEBUG] [urllib3.util.retry] [pid:17 ] [ MainThread] Converted retries value: 3 -u003e Retry(total=3, connect=None, read=None, redirect=None, status=None)",
"stream": "stderr",
"time": "2018-04-23T20:33:37.656915306Z",
"kubernetes": {
"pod_name": "test-webserver-847755c865-rz8x5",
"namespace_name": "qa1-ns",
"pod_id": "fef4a613-4732-11e8-8cbb-06712f6f161e",
"labels": {
"app": "test-webserver",
"pod-template-hash": "4033117421",
"release": "qa1-test"
},
"annotations": {
"iam_amazonaws_com/role": "test",
"kubernetes_io/created-by": "{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"qa1-ns\\\",\\\"name\\\":\\\"test-webserver-847755c865\\\",\\\"uid\\\":\\\"64d461b0-449a-11e8-9d82-06e3398080f4\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"9114439\\\"}}\\n",
"kubernetes_io/limit-ranger": "LimitRanger plugin set: cpu, memory limit for container test-container"
},
"host": "ip-10-100-22-172.us-west-2.compute.internal",
"container_name": "test-container",
"docker_id": "92ca99fcc91bd460739e930a4e6f87139fa2c68042f2760f411aece98a778c2b"
}
},
"fields": {
"@timestamp": [
1524515617656
],
"time": [
1524515617656
]
},
"sort": [
1524515617656
]
}
Configmap:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
K8S-Logging.Parser On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Logstash_Prefix logstash
TLS On
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
daemonset:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit-0.13-dev:0.18
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch_url.example.com"
- name: FLUENT_ELASTICSEARCH_PORT
value: "443"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
Edit: created https://github.com/fluent/fluent-bit/issues/580 to track this issue, since it appears to be a separate issue
I've read the documentation rearding the nested JSON support you are currently implementing for 0.13 in form of parser decoders: http://fluentbit.io/documentation/0.13/parser/decoder.html
You describe a typical use case in form of a containerized environments with Docker where the application logs data in JSON format which becomes an escaped string.
When using the "Decode_Field json log" there is one problem here, everything written to sysout/syserr is being written to disk by the docker json-file log driver which is being used to accomplish the described use case. But, not every line written to sysout/syserr is a well formatted JSON message. This leads to:
if the content is not a valid json-content the "log" attribute is of type "string"
if the content is a well formed json the "log" attribute is of type object
So why is this a problem? We do all that work because we want to collect all those logs and make em searchable through elasticsearch or something else. But the type of a json attribute has to stay the same for being indexable with elasticsearch. It has to stay either a string or an object.
Are there any plans for a remapping support of the decoded json content? The kubernetes filter for example merges all fields into the parent locument. Here we could do something similiar.