fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.86k stars 1.58k forks source link

Add native support for nested "JSON strings" #278

Closed pfremm closed 5 years ago

pfremm commented 7 years ago

How does fluent bit handle json within json where the sub json is a value for a message and not seen as a object? Often times the sub json is escaped so some work is needed by the plugin to work around this. For fluentd we needed a plugin. Does fluentbit solve this out of box?

edsiper commented 7 years ago

There is not a generic way at the moment, as said a plugin will be required.

The only case where this is handled is in the filter_kubernetes plugin where Docker JSON logs might have stringify json messages.

edsiper commented 7 years ago

hmm likely for Docker use case without Kubernetes this needs to be fixed. I am thinking to add some kind of merge_json_key option to let the parsers auto-handle that.

LarsKumbier commented 7 years ago

@edsiper if this does not work, then what is the v0.11.5 kubernetes-parameter Merge_JSON_Log On doing? I get a docker-input with the docker-parser, which contains a log-message and if that log-message is a JSON, the kubernetes-filter should incorporate it.

But when I try it, I get this error:

[2017/05/26 14:32:22] [ warn] [filter_kube] could not pack merged json
kube.var.log.containers.talk-json-to-me_default_talk-json-to-me-92bc5ba48086596ea4ca8f698f09701a59af7593f9989161f13b315ed56c1160.log: [1495809142, {"log":"{\"MaiTime\":\"Fri May 26 14:32:22 UTC 2017\",\"artist\":\"Jason Derulo\",\"lyrics\":\"Talk JSON to me\",\"dirty\":205}\r\n", "stream":"stdout", "time":"2017-05-26T14:32:22.545392924Z", "kubernetes":{"pod_name":"talk-json-to-me", "namespace_name":"default", "container_name":"talk-json-to-me", "docker_id":"92bc5ba48086596ea4ca8f698f09701a59af7593f9989161f13b315ed56c1160", "pod_id":"a2abebaf-421f-11e7-8086-0800270e934a"}}]
[2017/05/26 14:32:23] [ warn] [filter_kube] could not pack merged json
kube.var.log.containers.talk-json-to-me_default_talk-json-to-me-92bc5ba48086596ea4ca8f698f09701a59af7593f9989161f13b315ed56c1160.log: [1495809143, {"log":"{\"MaiTime\":\"Fri May 26 14:32:23 UTC 2017\",\"artist\":\"Jason Derulo\",\"lyrics\":\"Talk JSON to me\",\"dirty\":206}\r\n", "stream":"stdout", "time":"2017-05-26T14:32:23.5535398Z", "kubernetes":{"pod_name":"talk-json-to-me", "namespace_name":"default", "container_name":"talk-json-to-me", "docker_id":"92bc5ba48086596ea4ca8f698f09701a59af7593f9989161f13b315ed56c1160", "pod_id":"a2abebaf-421f-11e7-8086-0800270e934a"}}]}]

My Config-file:

[INPUT]
    Name           tail
    Tag            kube.*
    Path           /var/log/containers/*.log
    Parser         docker
    Mem_Buf_Limit  256MB

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

[FILTER]
    Name kubernetes
    Match kube.*
    Merge_JSON_Log On

[OUTPUT]
    Name  file
    Match *
    Path /tmp/fluent-bit.log

I've uploaded my complete setup here for convenience: https://github.com/LarsKumbier/fluent-bit-json-merge-test/

edsiper commented 7 years ago

@LarsKumbier thanks for the detailed explanation and test case.

I have been working on this and I found the two root causes for the problem:

1. Fluent Bit: the function to make unescaped strings, was returning a wrong string length and also do not respecting special characters like \r, \n, etc.

2. Test case: the test case provided generates an invalid JSON string:

{"log": "{\"MaiTime\":\"Tue May 30 02:23:34 UTC 2017\",\"artist\":\"Jason Derulo\",\"lyrics\":\"Talk JSON to me\",\"dirty\":0}\r\n","stream":"stdout","time":"2017-05-30T02:23:34.863074196Z"}

If you look carefully, after the json map there's an extra \r\n, so the map becomes invalid. This can be fixed using the echo -n command in the script.

From the Fluent Bit side the following fix have been pushed:

https://github.com/fluent/fluent-bit/commit/316d46d18abdbc0a7f013ceeec161e6a046dd492

I will release Fluent Bit 0.11.7 shortly with this issue fixed.

Thanks again for your help!

gganssauge commented 7 years ago

Are you sure about the \r\n at the end? The json rfc says whitespace is insignificant outside of quoted strings!

edsiper commented 7 years ago

@gganssauge the thing is that the \r\n (which are not empty characters) are inside the string:

...,\"dirty\":0}\r\n"

so after the map there are two unexpected bytes which are part of the main string

pfremm commented 7 years ago

My use case is extracting from JournalD. We have docker configured to write to journald and then extract the journal via fluent. This way all logs end up going through the journal and we don't have to worry about disparate collection of logs across a host.

edsiper commented 7 years ago

@LarsKumbier @gganssauge

My bad, a nested JSON will always have an ending \n because Docker engine is including it. Fixed by https://github.com/fluent/fluent-bit/commit/673d39cd39e26f540dd8b564016733c380e3d474 (it will be included in v0.11.8)

@pfremm Journald support will come soon, please upvote here: https://github.com/fluent/fluent-bit/issues/217

gganssauge commented 7 years ago

I'm using Lars' test case with 0.11.13 and still get [filter_kube] could not pack merged json.

my config is

[SERVICE]
    Flush 1
    Daemon Off
    Log_Level    debug
    Log_File     /tmp/fluent-bit.log
    Parsers_File parsers.conf

[INPUT]
    Name           tail
    Tag            kube.*
    Path           /var/log/containers/*.log
    Parser         docker
    Mem_Buf_Limit  256MB

[FILTER]
    Name kubernetes
    Match kube.*
    Merge_JSON_Log On

[OUTPUT]
    Name  file
    Match *
    Path /tmp/fluent-bit.log

[OUTPUT]
    Name forward
    Match *
    Host 127.0.0.1
    Port 24224
edsiper commented 7 years ago

@gganssauge would you please provide a json log that is failing so I can reproduce ?

gganssauge commented 7 years ago

error.zip I attached the last 100 lines of the talk-json-to-me container log as well as the last 1000 lines of the fluent-bit.log generated by the above configuration.

gganssauge commented 7 years ago

For testing I used minikube-0.20 on ubuntu linux-16.04 with a 1.5.3 kubernetes deployment.

edsiper commented 7 years ago

@gganssauge thanks for providing the test case.

I've found the problem happens because the log lines (nested) JSON ends in \r\n, instead of \n. I will improve the filter for such situation.

I was able to reproduce locally without minikube with this config:

[SERVICE]
    Flush 1
    Daemon Off
    Log_Level    info
    Parsers_File ../conf/parsers.conf

[INPUT]
    Name           tail
    Tag            kube.*
    Path           ./talk*.log
    Parser         docker
    Mem_Buf_Limit  256MB

[FILTER]
    Name kubernetes
    Match kube.*
    Merge_JSON_Log On
    Dummy_Meta On

[OUTPUT]
    Name stdout
    Match *
jalberto commented 7 years ago

I have a similar problem in k8s 1.7.2 & fluentbit 0.12.4 using json-file and this config (installed with helm):

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name             tail
    Path             /var/log/containers/*.log
    Parser           docker
    Tag              kube.*
    Refresh_Interval 5
    Mem_Buf_Limit    5MB
    Skip_Long_Lines  On

[FILTER]
    Name   kubernetes
    Match  kube.*
    Merge_JSON_Log On

[OUTPUT]
    Name  es
    Match *
    Host  elasticsearch.ops
    Port  9200
    Logstash_Format On
    Retry_Limit False

I cannot see kube-metadata and final log key in es is like this:

{\"type\":\"response\",\"@timestamp\":\"2017-10-02T15:40:11Z\",\"tags\":[],\"pid\":1,\"method\":\"get\",\"statusCode\":200,\"req\":{\"url\":\"/ui/favicons/favicon.ico\",\"method\":\"get\",\"headers\":{\"host\":\"monit-elasticsearch-kibana.ops\",\"connection\":\"keep-alive\",\"user-agent\":\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36\",\"save-data\":\"on\",\"accept\":\"image/webp,image/apng,image/*,*/*;q=0.8\",\"dnt\":\"1\",\"referer\":\"http://monit-elasticsearch-kibana.ops/app/kibana\",\"accept-encoding\":\"gzip, deflate\",\"accept-language\":\"en-GB,en;q=0.8,en-US;q=0.6,es;q=0.4\"},\"remoteAddress\":\"10.244.5.14\",\"userAgent\":\"10.244.5.14\",\"referer\":\"http://monit-elasticsearch-kibana/app/kibana\"},\"res\":{\"statusCode\":200,\"responseTime\":3,\"contentLength\":9},\"message\":\"GET /ui/favicons/favicon.ico 200 3ms - 9.0B\"}\n
edsiper commented 7 years ago

@jalberto

the original 'log' field is never touched or altered, would you please paste the full output of that record ?

jalberto commented 7 years ago

aside from internal field (like timestamp) there is not other fields, just "log". using fluentd instead of flunt-bit works as expected

On Mon, 2 Oct 2017 at 17:59 Eduardo Silva notifications@github.com wrote:

@jalberto https://github.com/jalberto

the original 'log' field is never touched or altered, would you please paste the full output of that record ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/278#issuecomment-333579406, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGGV_R8-Fiu7WoEq1neU5hHhXJNtIf2ks5soQhDgaJpZM4NjKNO .

edsiper commented 7 years ago

@jalberto are you querying on Kibana or directly on Elasticsearch with curl ?

jalberto commented 7 years ago

on kibana, but displaying the whole document

On Mon, 2 Oct 2017 at 20:14 Eduardo Silva notifications@github.com wrote:

@jalberto https://github.com/jalberto are you querying on Kibana or directly on Elasticsearch with curl ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fluent/fluent-bit/issues/278#issuecomment-333618819, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGGV9qmZE91I7BJl6mKy0fmwmxLVc0lks5soSgSgaJpZM4NjKNO .

azhi commented 7 years ago

Can confirm that the issue still exists on fluent-bit 0.12.6 inside k8s, details in gist (including config and example of a record inside elasticsearch fetched with curl).

edsiper commented 7 years ago

@azhi

I looked at your gist and Fluent Bit says:

[2017/10/20 01:15:36] [ warn] [filter_kube] could not pack merged json

that error happens when the nested JSON message is not a valid JSON. In order to continue troubleshooting would you please supply the original Docker log file that is generating the problem ?

edsiper commented 7 years ago

log provided by @chhetripradeep:

https://gist.githubusercontent.com/chhetripradeep/b89d5d0ee055dde5be15203aa582e705/raw/8c356042f19d65ca5ecda6c5e3772143dca84443/logline

lynxaegon commented 7 years ago

Just to be clear, is there any default support for merge_json in parser or filters without the kubernetes filter? I am currently running docker without kube, with log driver and my nginx output is a json. I thought it would merge it, but it just escaped the json and made it a string

Example: https://gist.githubusercontent.com/lynxaegon/ad7f503ca7316b5ac0db5a22114e00bd/raw/17567b5fc5ea48da870c8f0b5b4c16c79df28ebc/Proxy-Test%2520Log

edsiper commented 7 years ago

@lynxaegon

I will add an option to filter_parser to "unescape_key", on that way the parser will work properly.

edsiper commented 7 years ago

@lynxaegon and all

I have merged a new feature called __unescape_key__ that will be available on version 0.12.8 to deal with nested JSON-strings maps, more details in the following commit:

https://github.com/fluent/fluent-bit/commit/ec8c031f404c15461e0c74dca73b3f4dbe47f2f0

edsiper commented 7 years ago

All,

I've released 0.12.8 which address this problem from two angles:

  1. filter_kubernetes: when using Merge_JSON_Log option, now the filter will avoid to keep the escaped characters.
  2. The filter_parser now have a new option called __unescape_key__, so it can be used for scenario for Docker logs with nested string-JSON

http://fluentbit.io/announcements/v0.12.8/

stanisavs commented 6 years ago

@edsiper @pfremm Guys, hi all, please help, I can't fine any docs how to use unescape_key in my case. I have nginx access_log that looks like: {"log":{\"id\":\"5a1d3f49d44ee271231631\",\"cur\":[\"USD\"],\"at\":2,\"imp\":[{\"id\":1,\"banner\":{\"w\":0,\"h\":0},\"bidfloor\":0.9}]}} I want to get it with fluentd config:

<source>
        @type tail
        format json
        tag reqs.access
        path /var/log/nginx/test.access.log
        pos_file /var/log/nginx/test.access.log.pos
</source>
<filter reqs.access>
        @type parser
        key_name log
        format json
        unescape_key true
</filter>
<match reqs.access>
        @type kinesis_streams
        region us-west-1
        stream_name Kinesis_stream
</match>

How can I do that? I always get something like this: pattern not match: "{\"log\":{{\\\"id\\\":\\\"5a1d3f49d44ee271231631\\\",\\\"cur\\\":[\\\"USD\\\"],\\\"at\\\":2,\\\"imp\\\":[{\\\"id\\\":1,\\\"banner\\\":{\\\"w\\\":0,\\\"h\\\":0},\\\"bidfloor\\\":0.9}]}}" Also I got "parameter 'unescape_key' is not used" I need to pass that json to kinesis stream. Please help!

edsiper commented 6 years ago

all, please check the 0.13-dev image that have several improvements on this area:

https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev

rachirib-zz commented 6 years ago

Hi edsiper,

I try it myself using your repo but without success

https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev

{\"took\":9654,\"errors\":true,\"items\":[{\"index\":{\"_index\":\"logstash-2018.02.24\",\"_type\":\"flb_type\",\"_id\":\"TuTG2WEBba2pkx_gNGfa\",\"_version\":1,\"result\":\"created\",\"_shards\":{\"total\":2,\"successful\":2,\"failed\":0},\"_seq_no\":351985,\"_primary_term\":1,\"status\":201}},{\"index\":{\"_index\":\"logstash-2018.02.24\",\"_type\":\"flb_type\",\"_id\":\"T-TG2WEBba2pkx_gNGfa\",\"_version\":1,\"result\":\"created\",\"_shards\":{\"total\":2,\"successful\":2,\"failed\":0},\"_seq_no\":218105,\"_primary_term\":1,\"status\":201}},{\"index\":{\"_index\":\"logstash-2018.02.24\",\"_type\":\"flb_type\",\"_id\":\"UOTG2WEBba2pkx_gNGfa\",\"status\":429,\"error\":{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of org.elasticsearch.transport.TransportService$7@34adf8ec on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@51ffe9f8[Running, pool size = 2, active threads = 2, queued tasks = 200, completed tasks = 1557354]]\"}}},{\"i\n

Do I need to enable unescape_key true and the Merge_JSON_Log On ?

In theory, I don't.. as you have created this properties in the filter

Merge_Log On K8S-Logging.Parser On

andrewgdavis commented 6 years ago

Currently using:

    [SERVICE]
        Flush        1
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf

    [INPUT]
        Name             systemd
        Tag              host.*
        Path             /var/log/journal
        Systemd_Filter   _SYSTEMD_UNIT=docker.service
        Read_From_Tail   true

    [FILTER]
        Name   kubernetes
        Match  *
        Kube_URL        https://kubernetes.default
        Kube_CA_File    /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
        tls.verify  off
        use_journal     On
        Merge_log        On
        K8S-Logging.Parser On

    [FILTER]
        Name   parser
        Match  *
        Parser  syslog-rfc5424
        Decode_JSON_Field MESSAGE
        Merge_Log  on
        key_name MESSAGE
        unescape_key true

    [OUTPUT]
        Name          kafka
        Match         *
        Brokers       {{ .Values.backend.kafka.brokers }}
        Topics        {{ .Values.backend.kafka.topic }}
        Timestamp_Key @timestamp

and logs are output to kafka:

...
...
  "CONTAINER_ID": "50b213be931f",
  "CONTAINER_ID_FULL": "50b213be931ffccb3048332ae7612d03fd7f0936ab55db0aeff0bd52af7d9e3d",
  "CONTAINER_NAME": "k8s_my_foo_bar-7478fbb76d-f25bt_scoring_cf670539-1d5b-11e8-ad47-080027133a0c_0",
  "MESSAGE": "{\"timeMillis\":1519914091793,\"thread\":\"main\",\"level\":\"DEBUG\",\"loggerName\":\"org.springframework.beans.factory.annotation.InjectionMetadata\",\"message\":\"Processing injected element of bean 'org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor': AutowiredMethodElement for public void org.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessor.setGenericConverters(java.util.List)\",\"endOfBatch\":true,\"loggerFqcn\":\"org.apache.logging.log4j.jcl.Log4jLog\",\"threadId\":13,\"threadPriority\":5}\r",
  "_SOURCE_REALTIME_TIMESTAMP": "1519914091796374",
  "kubernetes": {
    "pod_name": "foo-bar-7478fbb76d-f25bt",
    "namespace_name": "foo",
    "pod_id": "cf670539-1d5b-11e8-ad47-080027133a0c",
    "labels": {
      "app": "foo-bar",
      "pod-template-hash": "3034966328",
      "raptor.sie.sony.com/logdriver": "journald",
      "release": "foo"
    },
    "annotations": {
      "kubernetes.io/created-by": "{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"foo\\\",\\\"name\\\":\\\"foo-bar-7478fbb76d\\\",\\\"uid\\\":\\\"201065c6-1d5b-11e8-8166-080027133a0c\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"122081\\\"}}\\n"
    },
    "host": "journald-worker01",
    "container_name": "foo-bar",
    "container_hash": ""
  }

Which looks ok to me thus far...

Basically all container logs are going through the docker daemon which uses logdriver=journald, but then types of messages in all the different containers need to be routed correctly.

One question I have: is it possible to parse the MESSAGE field as json (similar to the kubernetes output)? At that point, how does one route based upon specific keys to Topics? (ie log level: debug goes to a kafka-topic called debug). If that is not possible, what methods do folks use to differentiate types of logs?

S569 commented 6 years ago

@andrewgdavis, even I had the same issue , I added "Merge_JSON_Key"

[FILTER] Name kubernetes Match kube.* Kube_URL https://kubernetes.default.svc:443 Merge_JSON_Log On Merge_JSON_Key log

It works fine but Pods are failing in kubernetes. Can anyone tell me what might be issue or do I need to add anything else?

andrewgdavis commented 6 years ago

@Sushma569 k8s pods failing shouldn't be related to fluent-bit processing of logs... kubectl describe pod $your-pod should give some insight as to why they are failing. I have ran into a few failures, and usually it is because a liveness probe, a configuration error, permissions issue, or an OOM failure.

S569 commented 6 years ago

@andrewgdavis , I tried all the things in Kubernetes. I think I found the issue is "Merge_JSON_Key", other nodes are not able to find the key , that's why PODS are failing after sometime and backing up to running state later. I am trying to resolve this at fluentd.

therealdwright commented 6 years ago

@andrewgdavis did you find the solution to your question? I am having a very similar problem. More specifically the one around parsing the message field.

andrewgdavis commented 6 years ago

@TheRealDwright, I didn't find a solution-- however i was not able to try the suggested "Merge_JSON_Key". I was performing a couple of proof of concepts with different logging technologies, and unfortunately the json merge and kafka routing questions were left open ended.

edsiper commented 6 years ago

there are many issues/setups reported, @andrewgdavis which specific problem do you have ?, make sure to provide your current setup.

sumeethtewar commented 6 years ago

@edsiper I am already doing the Merge_JSON true on the kubernetes filter, however not able to get the MESSAGE field which is a json get tokenized seperately.

edsiper commented 6 years ago

@sumeethtewar do you want all the records under a new key ?, please paste your original log message and how you would like to see it after the filter process

sumeethtewar commented 6 years ago

@edsiper to give more details:

I am using a Kubernetes cluster. FluentBit is working as a Daemon set and the version is : 0.13.0 All working fine.

My Config file for fluentBit looks like the below:

[SERVICE] Flush 1 Log_Level info Daemon off Parsers_File parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020

[INPUT] Name systemd Tag docker.container.k8 Path /run/log/journal Parser docker DB /var/log/flb_sys_kube.db Systemd_Filter _SYSTEMD_UNIT=docker.service Systemd_Filter _TRANSPORT=journal Read_From_Tail true

[FILTER] Name kubernetes Match docker.container.k8 Kube_URL https://kubernetes.default.svc.cluster.local:443 Merge_Log On Merge_JSON_Key log K8S-Logging.Parser On Use_Journal On

[FILTER] Name record_modifier Match docker.container.k8 Whitelist kubernetes Whitelist MESSAGE Whitelist CONTAINER_ID Whitelist CONTAINER_TAG Remove_key CONTAINER_ID_FULL Remove_key _CAP_EFFECTIVE Remove_key _CMDLINE Remove_key _COMM Remove_key _EXE Remove_key _GID Remove_key _UID Remove_key _PID Remove_key _MACHINE_ID Remove_key _SELINUX_CONTEXT Remove_key _SYSTEMD_CGROUP Remove_key _TRANSPORT Remove_key _BOOT_ID

[OUTPUT] Name es Match docker.container.k8 Host ${FLUENT_ELASTICSEARCH_HOST} Port ${FLUENT_ELASTICSEARCH_PORT} Buffer_Size False Logstash_Format On Logstash_Prefix fluent Retry_Limit False Time_Key @timestamp Include_Tag_Key On Tag_key _tag_all

[PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%L Time_Keep On

Command | Decoder | Field | Optional Action

    # =============|==================|=================
    Decode_Field_As   escaped    log

I am echoing logs from one of the container as : echo "{ \"logLevel\": \"INFO\",\"LogDate\": \"$(date)\", \"ACTENANT\": \"1\", \"_ACUSER\": \"Sumeeth Tewar\", \"_CallRequestID\": \"2018New1Demo123\", \"_Msg\": \"The Actual Msg For Demo\", \"_TRAIL\": 0 }"

But in the elastic search I can see the MESSAGE being created as: { "_index": "fluent-2018.08.23", "_type": "flb_type", "_id": "nCpxZWUB8RYl9KxAWrcL", "_version": 1, "_score": null, "_source": { "@timestamp": "2018-08-23T06:20:51.0Z", "_tag_all": "docker.container.k8", "_HOSTNAME": "flexNode1", "PRIORITY": "6", "CONTAINER_NAME": "json-log-spewer", "CONTAINER_TAG": "docker.sumeettewar/json-log-spewer:latest", "CONTAINER_ID": "2d5a1989fc3c", "MESSAGE": "{ \"logLevel\": \"INFO\",\"LogDate\": \"Thu Aug 23 06:20:51 UTC 2018\", \"ACTENANT\": \"1\", \"_ACUSER\": \"Sumeeth Tewar\", \"_CallRequestID\": \"2018New1Demo123\", \"_Msg\": \"The Actual Msg For Demo\", \"_TRAIL\": 38 }", "_SOURCE_REALTIME_TIMESTAMP": "1535005251519215" }, "fields": { "@timestamp": [ "2018-08-23T06:20:51.000Z" ] }, "sort": [ 1535005251000 ] }

Expectation is: I edited the below so could be erroneous, but you can get an idea what i am thinking

{ "_index": "fluent-2018.08.22", "_type": "flb_type", "_id": "4WWYYWUBVY2Xketpn8_P", "_version": 1, "_score": null, "_source": { "@timestamp": "2018-08-22T12:25:16.0Z", "_tag_all": "docker.container.k8", "PRIORITY": "6", "_HOSTNAME": "flexNode1", "CONTAINER_ID": "c21ead338974", "CONTAINER_NAME": "json-log-spewer", "CONTAINER_TAG": "docker.sumeettewar/json-log-spewer:latest", "MESSAGE": "{ \"logLevel\": \"INFO\",\"LogDate\": \"Wed Aug 22 12:25:16 UTC 2018\", \"ACTENANT\": \"1\", \"_ACUSER\": \"Sumeeth Tewar\", \"_CallRequestID\": \"2018New1Demo123\", \"_Msg\": \"The Actual Msg For Demo\", \"_TRAIL\": 2489 }", "_SOURCE_REALTIME_TIMESTAMP": "1534940716649236" }, "logLevel":"INFO", "LogDate":"Wed Aug 22 20:17:50 UTC 2018", "ACTENANT":"1", "_ACUSER":"Sumeeth Tewar", "_CallRequestID":"2018New1Demo123", "_Msg":"The Actual Msg For Demo", "_TRAIL":30705, "fields": { "@timestamp": [ "2018-08-22T12:25:16.000Z" ] }, "sort": [ 1534940716000 ] }

Do let me know what changes I need to do to achieve the above expectation? I need to evaluate whether I could achieve this in FluentBit or else fall back onto FluentD?

sumeethtewar commented 6 years ago

@edsiper Can you please provide if there is update on this?

mitchellmaler commented 6 years ago

@edsiper I currently ran into this issue. I have JSON logs going to journald which is inside the MESSAGE field. The issue is that JSON is just sent along to the output as a full string where i would like to have that field parsed as json and split out into multiple fields either under the root or under a specified key (like message.field).

jcardoso-bv commented 6 years ago

Experiencing a similar issue but with a log message containing a JSON object nested in a JSON string. For example:

{"samp.timestamp":"2018-10-17T17:08:31.145Z","samp.correlationId":"123-456-789","samp.service":"edge","samp.level":"debug","samp.message":"{\"method\":\"GET\",\"hostname\":\"100.111.30.113\",\"path\":\"/health-check\",\"query\":{}}"}

Fluentd handles this just fine and expands 'samp.message' into a seperate key like so:

"_type": "fluentd", "_id": "bzACg2YBM-TQgHsV1nbW", "_version": 1, "_score": null, "_source": { "samp.timestamp": "2018-10-17T17:08:31.145Z", "samp.correlationId": "123-456-789", "samp.service": "edge", "samp.level": "debug", "samp.message": "{\"method\":\"GET\",\"hostname\":\"100.111.30.113\",\"path\":\"/health-check\",\"query\":{}}", "log": "{\"samp.timestamp\":\"2018-10-17T17:08:31.145Z\",\"samp.correlationId\":\"123-456-789\",\"samp.service\":\"edge\",\"samp.level\":\"debug\",\"samp.message\":\"{\\"method\\":\\"GET\\",\\"hostname\\":\\"100.111.30.113\\",\\"path\\":\\"/health-check\\",\\"query\\":{}}\"}\n", "stream": "stdout",

Fluent Bit sadly just ignores the whole log message and fails to merge any JSON.

We'd love to switch to Fluent Bit but this is a show stopper for us unless a workaround/solution can be found.

donbowman commented 6 years ago

can u use filter_parser on the field in question? with Reserve_Data / Preserve_Key option? https://docs.fluentbit.io/manual/filter/parser

jcardoso-bv commented 6 years ago

We can indeed. Someone else suggested the same and we're now using the following which seems to work well:

# https://docs.fluentbit.io/manual/filter/parser
[FILTER]
    Key_Name            log
    Match               kubernetes.*
    Name                parser
    Parser              json
    Reserve_Data        True

# https://docs.fluentbit.io/manual/filter/kubernetes
[FILTER]
    K8S-Logging.Exclude On
    K8S-Logging.Parser  On
    Match               kubernetes.*
    Merge_Log           On
    Name                kubernetes
edsiper commented 5 years ago

Please check the following comment on #1278 :

https://github.com/fluent/fluent-bit/issues/1278#issuecomment-499583503

edsiper commented 5 years ago

Issue already fixed, ref: https://github.com/fluent/fluent-bit/issues/1278#issuecomment-502183521