fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.83k stars 1.58k forks source link

Emoji got UNICODE-escaped for file output when running fluent-bit with Docker #8521

Open zhaow-de opened 8 months ago

zhaow-de commented 8 months ago

Bug Report

Describe the bug

We have emojis in the log message. The emojis are carried forward in the pipeline at fluent-bit appropriately. However, in the last step, if the output is s3 or file, the emojis got UNICODE-escaped in the produced file.

To Reproduce

Create a simple test input file input.log, which contains only one line:

🌈

Run fluent-bit with Docker using file output:

docker run -it --rm -v $(pwd):/data fluent/fluent-bit:2.2.2 /fluent-bit/bin/fluent-bit -i tail -p read_from_head=true -p exit_on_eof=true -p path=/data/input.log -o file -p path=/data -p file=output.log --verbose

In the output.log file, the emoji is UNICODE-escaped:

tail.0: [1708941282.353294128, {"log":"\u1f308"}]

Expected behavior

The output should be:

tail.0: [1708941282.353294128, {"log":"🌈"}]

Your Environment

Additional context

Still taking the input.log example, I succeeded with some other scenarios:

Case 1: run fluent-bit with Docker using stdout as output. --Work as expected

Command:

docker run -it --rm -v $(pwd):/data fluent/fluent-bit:2.2.2 /fluent-bit/bin/fluent-bit -i tail -p read_from_head=true -p exit_on_eof=true -p path=/data/input.log -o stdout

Output (from the log):

[0] tail.0: [[1708941170.693859307, {}], {"log"=>"🌈"}]

Case 2: run fluent-bit without Docker using file as output. --Work as expected

Command:

fluent-bit -i tail -p read_from_head=true -p exit_on_eof=true -p path=input.log -o file -p file=output.log --verbose

Output (from output.log file):

tail.0: [1708941936.998975000, {"log":"🌈"}]

I also tried to build the image by taking debian:bullseye-slim as the base image. Installed locales:

    apt-get -qq install --no-install-recommends locales && \
    echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
    dpkg-reconfigure --frontend=noninteractive locales && \
    update-locale LANG=en_US.UTF-8

and set the environment variables accordingly in the Dockerfile:

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV LC_ALL en_US.UTF-8

It did not change the result.

ensean commented 5 months ago

Confirmed that the issue exists for the following env.

OS: Amazon Linux 2023 arm
fluent bit: [fluent bit] version=3.0.3

BTW, if I change the os arch to x86, the issue disappeared...

RamaMalladiAWS commented 5 months ago

Build with -fsigned-char on arm resolves this issue.

PettitWesley commented 5 months ago

@RamaMalladiAWS sorry, that's a cmake flag right? I think we need to add this to AWS for Fluent Bit distro. Would you like to submit the github commit for the diff of the change?

RamaMalladiAWS commented 5 months ago

@RamaMalladiAWS sorry, that's a cmake flag right? I think we need to add this to AWS for Fluent Bit distro. Would you like to submit the github commit for the diff of the change?

Yes, I can do.

RamaMalladiAWS commented 5 months ago

I submitted PR: https://github.com/fluent/fluent-bit/pull/8851.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

RamaMalladiAWS commented 2 months ago

We are waiting on merge of PR: https://github.com/fluent/fluent-bit/pull/8851.