fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.85k stars 1.58k forks source link

Plugin in_systemd: duplicate keys #501

Open manuelluis opened 6 years ago

manuelluis commented 6 years ago

The plugin in_systemd doesn't check for duplicates in systemd: with journalctl you will get: "MESSAGE" : "Hello World!", "FIELD1" : [ "a", "b" ] and in fluent-bit: "MESSAGE"=>"Hello World!", "FIELD1"=>"a", "FIELD1"=>"b"

It happens in the real world, for example if you do journalctl -o json | fgrep ' : [' , you will get entries with: "SYSLOG_FACILITY" : [ "DHCP4", "DHCP6" ]

journalctl check this case in the json output with a hash table to detect duplicate keys, and generate an array in the case of duplicates.

I think this is a problem because not al parsers of json handle duplicates, and the fluent-bit json output generate this duplicates (not tested, I just read the code).

qingling128 commented 3 years ago

Looks like this is still relevant.

Reproduction

echo -e 'MESSAGE=test native message with multiple values\nKEY=value1\nKEY=value2\n'  | socat - UNIX-SENDTO:/run/systemd/journal/socket

Received log:

{
  KEY: "value2"
  MESSAGE: "test native message with multiple values"
  ...
}

It seems like only the last value is preserved if there are multiple values of the same key.

Ideal behavior

{
  KEY: ["value1", "value2"]
  MESSAGE: "test native message with multiple values"
  ...
}
cosmo0920 commented 2 months ago

I reproduced this issue with journalctl's -o json-pretty option:

{
        "_PID" : "2283409",
        "_BOOT_ID" : "793b61c6fd2d4d03b749e694db973d43",
        "_UID" : "1000",
        "_EXE" : "/usr/bin/socat",
        "_SOURCE_REALTIME_TIMESTAMP" : "1724319097012804",
        "_SYSTEMD_SLICE" : "user-1000.slice",
        "_GID" : "1000",
        "_HOSTNAME" : "cosmo-desktop2",
        "KEY" : [
                "value1",
                "value4",
                "another"
        ],
        "_SYSTEMD_USER_SLICE" : "app-org.gnome.Terminal.slice",
        "__CURSOR" : "s=e4d1d97fab6c4d858c69139ccbdb19e1;i=78b7d;b=793b61c6fd2d4d03b749e694db973d43;m=d19e81e112;t=620425341c26f;x=f42e5f86836105ba",
        "__REALTIME_TIMESTAMP" : "1724319097012847",
        "_AUDIT_LOGINUID" : "1000",
        "KEY2" : "value2",
        "_SYSTEMD_OWNER_UID" : "1000",
        "_COMM" : "socat",
        "_SYSTEMD_INVOCATION_ID" : "dc32a18d1b6b43f681a7395960080cb2",
        "_SYSTEMD_USER_UNIT" : "vte-spawn-7fa61de5-34f1-402d-a599-df125e505395.scope",
        "_SYSTEMD_UNIT" : "user@1000.service",
        "__MONOTONIC_TIMESTAMP" : "900307476754",
        "_SELINUX_CONTEXT" : "unconfined\n",
        "_CMDLINE" : "socat - UNIX-SENDTO:/run/systemd/journal/socket",
        "_MACHINE_ID" : "8f5838aa79d047acf9ebf69700000005",
        "_CAP_EFFECTIVE" : "0",
        "_TRANSPORT" : "journal",
        "_SYSTEMD_CGROUP" : "/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-7fa61de5-34f1-402d-a599-df125e505395.scope",
        "_AUDIT_SESSION" : "3",
        "MESSAGE" : "test native message with multiple values"
}

with this command:

$ echo -e 'MESSAGE=test native message with multiple values\nKEY=value1\nKEY=value4\nKEY2=value2\nKEY=another\n'  | socat - UNIX-SENDTO:/run/systemd/journal/socket