elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

[Journald] Better support binary blob entries #40479

Open belimawr opened 1 month ago

belimawr commented 1 month ago

The Journald Native Protocol supports binary data for any key/value pair.

Currently we do not support parsing binary data and just read it as an array of numbers, this is how it looks in Kibana: 2024-08-09_12-18

The Docker daemon sometimes logs messages in binary format, one way to easily reproduce it is to run a Debian container using the journald driver and then run any commands, the shell output will go to the docker logs/journald, but in binary format.

docker run --log-driver=journald --rm -it debian

This is how a journal entry with the message in binary format looks like

Event with binary data

```json { "@timestamp": "2024-08-09T15:40:55.977Z", "message": "[27 91 63 50 48 48 52 108 13 101 120 105 116 13]", "host": { "name": "millennium-falcon", "hostname": "millennium-falcon", "id": "851f339d77174301b29e417ecb2ec6a8" }, "journald": { "uid": 0, "pid": 2020294, "host": { "boot_id": "150c5a297ef24c9d93d364bf2ffef750" }, "custom": { "seqnum_id": "e82795fad4ce42b79fb3da0866d91f7e", "runtime_scope": "system", "container_log_ordinal": "6", "seqnum": "5073853", "realtime_timestamp": "1723218055977296", "container_log_epoch": "b1926fa79f85a18da2af033d06a45ff25800e1d3c3936d0cd554e7413c9ad492", "syslog_timestamp": "2024-08-09T15:40:55.976822337Z" }, "gid": 0, "process": { "capabilities": "1ffffffffff", "command_line": "/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock", "name": "dockerd", "executable": "/usr/bin/dockerd" } }, "input": { "type": "journald" }, "syslog": { "priority": 6, "identifier": "e7b8f73aa262" }, "systemd": { "unit": "docker.service", "transport": "journal", "slice": "system.slice", "cgroup": "/system.slice/docker.service", "invocation_id": "0557470aa3244434ad22669df50e244f" }, "log": { "syslog": { "priority": 6 } }, "process": { "command_line": "/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock", "args": [ "/usr/bin/dockerd", "-H", "fd://", "--containerd=/run/containerd/containerd.sock" ], "args_count": 4, "thread": { "capabilities": { "effective": [ "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_DAC_READ_SEARCH", "CAP_FOWNER", "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID", "CAP_SETPCAP", "CAP_LINUX_IMMUTABLE", "CAP_NET_BIND_SERVICE", "CAP_NET_BROADCAST", "CAP_NET_ADMIN", "CAP_NET_RAW", "CAP_IPC_LOCK", "CAP_IPC_OWNER", "CAP_SYS_MODULE", "CAP_SYS_RAWIO", "CAP_SYS_CHROOT", "CAP_SYS_PTRACE", "CAP_SYS_PACCT", "CAP_SYS_ADMIN", "CAP_SYS_BOOT", "CAP_SYS_NICE", "CAP_SYS_RESOURCE", "CAP_SYS_TIME", "CAP_SYS_TTY_CONFIG", "CAP_MKNOD", "CAP_LEASE", "CAP_AUDIT_WRITE", "CAP_AUDIT_CONTROL", "CAP_SETFCAP", "CAP_MAC_OVERRIDE", "CAP_MAC_ADMIN", "CAP_SYSLOG", "CAP_WAKE_ALARM", "CAP_BLOCK_SUSPEND", "CAP_AUDIT_READ", "CAP_PERFMON", "CAP_BPF", "CAP_CHECKPOINT_RESTORE" ] } }, "pid": 2020294 }, "user": { "id": "0", "group": { "id": "0" } }, "event": { "created": "2024-08-09T15:44:45.776Z", "kind": "event" }, "agent": { "ephemeral_id": "2e4b5b36-4f77-4b78-8907-096488a7107e", "id": "51f656d0-ab13-4f96-a4f3-57a11ddb630d", "name": "millennium-falcon", "type": "filebeat", "version": "8.16.0" }, "ecs": { "version": "8.0.0" }, "container": { "id_truncated": "e7b8f73aa262", "id": "e7b8f73aa262a9dfbc1006d6c615fdb76db9e27462fa629efaf3fdbfb0fc469c", "image": { "name": "debian" }, "name": "great_poitras", "log": { "tag": "e7b8f73aa262" } } } ```

The raw JSON returned by journalctl for the event above

Journald raw JSON

```json { "CONTAINER_ID": "e7b8f73aa262", "CONTAINER_ID_FULL": "e7b8f73aa262a9dfbc1006d6c615fdb76db9e27462fa629efaf3fdbfb0fc469c", "CONTAINER_LOG_EPOCH": "b1926fa79f85a18da2af033d06a45ff25800e1d3c3936d0cd554e7413c9ad492", "CONTAINER_LOG_ORDINAL": "6", "CONTAINER_NAME": "great_poitras", "CONTAINER_TAG": "e7b8f73aa262", "IMAGE_NAME": "debian", "MESSAGE": [ 27, 91, 63, 50, 48, 48, 52, 108, 13, 101, 120, 105, 116, 13 ], "PRIORITY": "6", "SYSLOG_IDENTIFIER": "e7b8f73aa262", "SYSLOG_TIMESTAMP": "2024-08-09T15:40:55.976822337Z", "_BOOT_ID": "150c5a297ef24c9d93d364bf2ffef750", "_CAP_EFFECTIVE": "1ffffffffff", "_CMDLINE": "/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock", "_COMM": "dockerd", "_EXE": "/usr/bin/dockerd", "_GID": "0", "_HOSTNAME": "millennium-falcon", "_MACHINE_ID": "851f339d77174301b29e417ecb2ec6a8", "_PID": "2020294", "_RUNTIME_SCOPE": "system", "_SOURCE_REALTIME_TIMESTAMP": "1723218055976827", "_SYSTEMD_CGROUP": "/system.slice/docker.service", "_SYSTEMD_INVOCATION_ID": "0557470aa3244434ad22669df50e244f", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "docker.service", "_TRANSPORT": "journal", "_UID": "0", "__CURSOR": "s=e82795fad4ce42b79fb3da0866d91f7e;i=4d6bbd;b=150c5a297ef24c9d93d364bf2ffef750;m=1b05a81af8;t=61f41f818d950;x=136554b362a01089", "__MONOTONIC_TIMESTAMP": "116059020024", "__REALTIME_TIMESTAMP": "1723218055977296", "__SEQNUM": "5073853", "__SEQNUM_ID": "e82795fad4ce42b79fb3da0866d91f7e" } ```

This seems to be the behaviour when there are some control characters in the messages, here is the above message converted to string:

[?2004l
exit

We have a test file that contains on binary blob (core dump) and a message that contains control characters. The file is https://github.com/elastic/beats/blob/63fc18c71c2edc8fa56d66fe34d8c7c7a54dfdb0/filebeat/input/journald/pkg/journalctl/testdata/corner-cases.json

elasticmachine commented 1 month ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)