fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.73k stars 1.56k forks source link

Fluent-bit build-in plugin loki output new-line and tab characters #2808

Open romanklos87 opened 3 years ago

romanklos87 commented 3 years ago

Bug Report

Describe the bug I am sending Java Logs in json format from Fluent-bit to Loki new-line (\n) and tab (\t) characters within record are shown as plain text in Grafana. Example is \n instead of newline. It looks like there is an issue with plugin.
The same thing was tested on same instance and application with FluentD. Logs was forwarded from Fluent-bit to FluentD and then to Loki. Grafana shows FluentD json logs correctly.

To Reproduce

Just send json exception log message.

Expected behavior Correct view of exception message in Grafana.( new-line, tab instead of \n and \t)

Screenshots Grafana output of exception from Fluent-bit issue

Grafana output of exception from FluentD (Fluent-bit is collector) correct_output

Your Environment

[INPUT] name systemd tag java_app.* systemd_Filter _SYSTEMD_UNIT=java_app.service

[FILTER] Name parser Match java_app.* Key_Name MESSAGE Reserve_Data On Parser json

[OUTPUT] name loki match * host port 3100 labels job=fluentbit


FluentD config (second scenario):

<match td..> @type tdlog @id output_td apikey YOUR_API_KEY

auto_create_table

@type file path /var/log/td-agent/buffer/td @type file path /var/log/td-agent/failed_records

<match debug.**> @type stdout @id output_stdout

@type forward @id input_forward port 24224 @label @java_app

<label @java_app> <filter **> @type parser key_name MESSAGE reserve_data true remove_key_name_field true

@type json
</filter>
<match **>
    @type copy
    <store>
        @type loki
        url "http://<loki_ip_address>:3100"
        extra_labels {"agent":"fluentd"}
        line_format json
        flush_interval 10s
        flush_at_shutdown true
        buffer_chunk_limit 1m
    </store>
    <store>
        @type stdout
    </store>
</match>

* Environment name and version (e.g. Kubernetes? What version?):

OpenJDK Runtime Environment Corretto-11.0.9.12.1 (build 11.0.9.1+12-LTS) OpenJDK 64-Bit Server VM Corretto-11.0.9.12.1 (build 11.0.9.1+12-LTS, mixed mode)

Grafana v7.3.3 (2489dc4d3a)


* Operating System and version:
`Linux ip-.eu-west-.compute.internal 4.14.203-156.332.amzn2.x86_64 # `
github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.

hexwit commented 1 year ago

Why this issue is ignored? It is 2022, and problem still remains.

braunsonm commented 1 year ago

Taken from codeowners as no owner exists for the loki plugin @edsiper @leonardo-albertovich @fujimotos @koleini

Could one of the maintainers please reopen this issue? The problem remains.

braunsonm commented 1 year ago

After looking at this with the source I found the problem here but I'm not familiar enough for the code base to offer a definitive fix. If a maintainer can point me in the right direction I can open a PR.

The problem is pretty simple it comes from the safe string parsing when generating a JSON string to send in the payload to Loki here. Removing this \n check fixes the issue in Loki but since this is a utility function that's probably not a desired result across all of fluent-bit.

What would be the best way to ensure \n and \t are excluded from safe string parsing for the loki output?

The call to it in loki comes from here. No combination of decoders seem to change the outcome here as the \n is always escaped when parsed for the HTTP Body.

This is similar to a lot of the decoding woe's described here: https://github.com/fluent/fluent-bit/issues/1278 Which @edsiper worked through with some success. Again, with any form of guidance here I could probably open a PR to fix this but changing a utility function doesn't seem like the right approach.

{
  "streams": [
    {
      "stream": {
        "label": "value"
      },
      "values": [
          [ "1665195203094100836", "hello\nworld" ]
      ]
    }
  ]
}

The above results in a properly parsed log line with a new line, and from what I can tell because of the util function the following is being sent instead:

{
  "streams": [
    {
      "stream": {
        "label": "value"
      },
      "values": [
          [ "1665195203094100836", "hello\\nworld" ]
      ]
    }
  ]
}
leonardo-albertovich commented 1 year ago

Feel free to ping me in the public slack server @braunsonm, I'm not sure what the solution might be in this case but if you have a simple repro case I might be able to give you some feedback and guide you through the process.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

braunsonm commented 1 year ago

This continues to be a problem.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

braunsonm commented 1 year ago

Still a problem.

pingping95 commented 1 year ago

Whats going on about this problem? im facing this issue.

leonardo-albertovich commented 1 year ago

TBH, I dropped the ball. I think we'll need some more input here because at the moment I wasn't really convinced of how could we make this change in a way that didn't cause issues and wasn't even sure if it was something that should be done.

My memory is fuzzy due to how much time passed but I think if we want to approach this again we might have to take a step back and consider other options such as making a localized patch in the loki output plugin.

pingping95 commented 1 year ago

@leonardo-albertovich

Thanks for replying.

Would this problem be solved if the log collection structure was sent via fluentd rather than directly from fluentbit to loki?

image
leonardo-albertovich commented 1 year ago

That's something I have absolutely no idea as I have zero experience with fluentd. Sending logs from fluent-bit to fluentd using the forward protocol is really simple though so it might be worth a shot as a temporary workaround.

emmacz commented 1 year ago

Any update on this?

We face to the same issue... See \n below

log content: displayed using grep SCN logfile.txt | cat -A <txt>Completed checkpoint up to RBA [0x21a1.2.10], SCN: 481007916$ </txt>$

fluent-bit generated file (part of it) using Output plugin 'file' "log":{"message":"Completed checkpoint up to RBA [0x21a1.2.10], SCN: 481007916\n ","time":"2023-05-19T17:24:32.282"}

braunsonm commented 1 year ago

Agree would be great to fix this. I know @leonardo-albertovich was looking into it at one point

leonardo-albertovich commented 1 year ago

I did and I gave up at the moment. I think I communicated what I found and why I couldn't wrap it up but I can't seem to find that now.

I will save this and try to take a look as soon as possible but I can't make any promises. Especially taking in account that I already took a look and for some reason couldn't do it.

scotlyt commented 1 year ago

I just gave up on the plugin and use the grafana-loki plugin by compiling from https://github.com/grafana/loki/tree/main/clients/cmd/fluent-bit using make and creating a out_grafana_loki.so based on arch, putting that plugin into something like /fluent-bit/etc/out_grafana_loki.so and tell fluent-bit to load it in. then you can use the docs https://grafana.com/docs/loki/latest/clients/fluentbit/ and have more options to set the delivery to loki.

I know this doesn't fix the problem here, but it could be an alternative instead of waiting on someone to fix something no one wants to maintain.

jcdauchy-moodys commented 10 months ago

I just gave up on the plugin and use the grafana-loki plugin by compiling from https://github.com/grafana/loki/tree/main/clients/cmd/fluent-bit using make and creating a out_grafana_loki.so based on arch, putting that plugin into something like /fluent-bit/etc/out_grafana_loki.so and tell fluent-bit to load it in. then you can use the docs https://grafana.com/docs/loki/latest/clients/fluentbit/ and have more options to set the delivery to loki.

I know this doesn't fix the problem here, but it could be an alternative instead of waiting on someone to fix something no one wants to maintain.

Is it still compile with 2.1.x FB version ? Since the beginning the Loki plugin give different outputs than the grafana Go plugin :(

scotlyt commented 10 months ago

I do not know.

My implementation was based off of just using the build process/image from https://github.com/aws/aws-for-fluent-bit.git which is currently on 1.9.10.

This was my build process:

I compile the plugin directly from the loki repo:

ssh ECSBOX-AMD64-CHIPSET # to make sure whatever ECS system can use that plugin
git clone https://github.com/grafana/loki
cd loki
make fluent-bit-plugin # makefile in there to create the binary for grafana/loki for firelens
exit
scp EC2BOX-AMD64-CHIPSET:/clonepath/loki/clients/cmd/fluent-bit/out_grafana_loki.so .

Then added it to the init process for aws-for-fluent-bit:

git clone https://github.com/aws/aws-for-fluent-bit.git
vim Dockerfile.init
#change code
 FROM amazon/aws-for-fluent-bit:latest
+ADD out_grafana_loki.so /fluent-bit/

 RUN mkdir -p /init
#

vim init/fluent_bit_init_process.go
#change code
        // default Fluent Bit command
-       baseCommand = "exec /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so"
+       baseCommand = "exec /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so -e /fluent-bit/out_grafana_loki.so"

        // global s3 client and flag
#

I built my own image so it had the plugin directly from Loki into the system that AWS maintains. Keeps the maintenance much lower.