fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.77k stars 1.57k forks source link

Separation of trusted/discovered/contextual keys #8347

Open gebn opened 8 months ago

gebn commented 8 months ago

Is your feature request related to a problem? Please describe.

There does not appear to be a way to separate contextual keys (e.g. those added by the systemd plugin based on journal information) from those extracted from an application message. This allows collisions, meaning otherwise trustworthy keys (e.g. _HOSTNAME) can be deliberately or inadvertently overwritten by the application.

As a specific example, the systemd plugin produces records of the form:

{
  "date": "2023-11-11T07:00:01.143640Z",
  "PRIORITY": "6",
  "_SYSTEMD_SLICE": "system.slice",
  "_BOOT_ID": "4d4d3a92077f43e998dc8133cf631ca8",
  "_MACHINE_ID": "ec2c523849ba88c012d52686524f1e2f",
  "_HOSTNAME": "i-06a8cc6ccaefca7ba",
  "_RUNTIME_SCOPE": "system",
  "_SELINUX_CONTEXT": "unconfined\n",
  "_TRANSPORT": "stdout",
  "SYSLOG_FACILITY": "3",
  "_CAP_EFFECTIVE": "0",
  "_STREAM_ID": "1693ea2a2ca44e6bb49980dc5662a2c8",
  "SYSLOG_IDENTIFIER": "prometheus",
  "_PID": "184376",
  "_UID": "991",
  "_GID": "990",
  "_COMM": "prometheus",
  "_EXE": "/opt/prometheus/prometheus",
  "_CMDLINE": "/opt/prometheus/prometheus --log.format json --config.file /etc/opt/prometheus/prometheus.yaml --storage.tsdb.path /var/opt/prometheus/data --storage.tsdb.retention.size 4GiB --web.page-title Prometheus --web.external-url https://prometheus.euw2-az2.thebrightons.uk --storage.tsdb.min-block-duration 2h --storage.tsdb.max-block-duration 2h",                                                                                                                                          
  "_SYSTEMD_CGROUP": "/system.slice/prometheus.service",
  "_SYSTEMD_UNIT": "prometheus.service",
  "_SYSTEMD_INVOCATION_ID": "d18b559ec9a946689189edcf19844dd5",
  "MESSAGE": "{\"ts\":\"2023-11-11T07:00:01.143Z\",\"caller\":\"db.go:1617\",\"level\":\"info\",\"component\":\"tsdb\",\"msg\":\"Deleting obsolete block\",\"block\":\"01H9HDMZDKQ9RN3EQ6P6TFM8F6\"}"
}

The first step in working with this is parsing the MESSAGE, however doing so using the parser filter plugin will pollute the record with a combination of systemd (trusted) and application (untrusted) fields. Further, if there were a _HOSTNAME key in MESSAGE, it would overwrite the real hostname, which is a potential security issue if this key is then used for log routing.

Describe the solution you'd like

A way to expand an encoded field into key/value pairs underneath a specific top-level key. Perhaps something like:

[FILTER]
    name: parser
    key_name: MESSAGE
    dest_key: log
    ...

Fields that the user wants to ultimately end up in the written record can then be added/removed under the log key, and then that key can be specified for sending to an output, with everything else ignored, e.g.

[OUTPUT]
    name: cloudwatch_logs
    log_key: log
    ...

Describe alternatives you've considered

I've tried using the nest plugin to move the MESSAGE inside a parent key, then using decode_field_as on a json parser to expand it in-place, however this does not modify the record, or expands at the root level.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

gebn commented 5 months ago

Still looking for a solution to this!

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

gebn commented 2 months ago

Still looking for a solution to this!