fluent-plugins-nursery / fluent-plugin-concat

Fluentd Filter plugin to concatenate multiline log separated in multiple events.
MIT License
108 stars 33 forks source link

Concat containerd/docker output in the same config #103

Open sonnyhcl opened 3 years ago

sonnyhcl commented 3 years ago

Problem

Since kubernetes is deprecating docker log driver and using containerd instead. We need to support concat both containerd and docker in the same time to make sure upgrade kubernetes version seamlessly. I know readme has some example to concat for docker/containerd seperately. But when I use both, the log output is empty.

Steps to replicate

Provide example config and message

fluentd.conf

# This file collects and filters all Kubernetes container logs. Should rarely need to modify it.

# Do not directly collect fluentd's own logs to avoid infinite loops.
<match fluent.**>
  @type null
</match>

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  refresh_interval 2
  rotate_wait 5
  <parse>
     @type multi_format
     <pattern>
       format regexp
       expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/
       time_format %Y-%m-%dT%H:%M:%S.%NZ
       keep_time_key true
     </pattern>
     <pattern>
       format json
       time_key @timestamp
       time_format %Y-%m-%dT%H:%M:%S.%NZ
       keep_time_key true
     </pattern>
  </parse>
</source>

<filter kubernetes.**>
  @type kubernetes_metadata
  watch false
</filter>

# Exclude events from Geneva containers since they just seem to echo events from other containers
<filter kubernetes.var.log.containers.geneva**.log>
  @type grep
  <exclude>
    key log
    pattern .*
  </exclude>
</filter>

# Concat containerd partial log
# https://github.com/fluent/fluentd-kubernetes-daemonset/issues/412#issuecomment-636536767
<filter **>
  @id containerd_concat
  @type concat
  key log
  use_first_timestamp true
  partial_key logtag
  partial_value P
  separator ""
</filter>

# Concat log truncated by docker 16KB limit
<filter **>
  @id filter_concat
  @type concat
  key log
  use_first_timestamp true
  multiline_end_regexp /\n$/
  separator ""
</filter>

# Flatten fields nested within the 'log' field
<filter kubernetes.var.log.containers.**.log>
  @type parser
  format json
  key_name log
  reserve_data true
</filter>

# Flatten fields nested within the 'kubernetes' field and remove unnecessary fields
<filter kubernetes.var.log.containers.**.log>
  @type record_transformer
  enable_ruby
  <record>
    ContainerName ${record["kubernetes"]["container_name"]}
    NamespaceName ${record["kubernetes"]["namespace_name"]}
    PodName ${record["kubernetes"]["pod_name"]}
    Node ${record["kubernetes"]["host"]}
  </record>
  remove_keys docker,kubernetes,stream,log
</filter>

# Anything else goes to standard output
<match **>
  @type stdout
</match>

logger.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  name: logger
  labels:
    app: logger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logger
  template:
    metadata:
      labels:
        app: logger
    spec:
      containers:
        - name: logger
          image: ubuntu
          command:
            - /bin/sh
          args:
            - '-c'
            - while true; do echo {\"EventName\":\"EventA\",\"Msg\":\"$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)\"}; echo {\"EventName\":\"EventB\",\"Msg\":\"$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 100000 | head -n 1)\"}; sleep 5; done

Expected Behavior

Log should be parsed and concat seamlessly for both containerd/docker log format.

Your environment

LOCAL GEMS

addressable (2.7.0) async (1.28.3) async-http (0.54.1) async-io (1.30.1) async-pool (0.3.3) aws-eventstream (1.1.0) aws-partitions (1.427.0)
aws-sdk-core (3.112.0) aws-sdk-kms (1.42.0) aws-sdk-s3 (1.88.1) aws-sdk-sqs (1.36.0) aws-sigv4 (1.2.2) benchmark (default: 0.1.0)
bigdecimal (default: 2.0.0)
bundler (2.2.11, default: 2.1.4) cgi (default: 0.1.0) concurrent-ruby (1.1.8, 1.1.7)
console (1.10.1) cool.io (1.7.1) csv (default: 3.1.2) date (default: 3.0.0) delegate (default: 0.1.0)
did_you_mean (default: 1.4.0)
digest-crc (0.6.3) domain_name (0.5.20190701)
elasticsearch (7.10.0) elasticsearch-api (7.10.0)
elasticsearch-transport (7.10.0) etc (default: 1.1.0) excon (0.78.1) faraday (1.3.0) faraday-net_http (1.0.0) fcntl (default: 1.0.0) ffi (1.14.2) ffi-compiler (1.0.1) fiber-local (1.0.0) fiddle (default: 1.0.0) fileutils (1.5.0, default: 1.4.1) fluent-config-regexp-type (1.0.0) fluent-diagtool (1.0.1) fluent-logger (0.9.0) fluent-plugin-azureeventhubs (0.0.7)
fluent-plugin-colomanager-heartbeat (0.1.0) fluent-plugin-concat (2.4.0) fluent-plugin-elasticsearch (4.3.3)
fluent-plugin-flatten-hash (0.5.1) fluent-plugin-flowcounter-simple (0.1.0) fluent-plugin-hanarp-message (0.1.0) fluent-plugin-json-transform (0.0.1) fluent-plugin-kafka (0.16.0) fluent-plugin-kubernetes_metadata_filter (2.5.3) fluent-plugin-mdm (0.1.0) fluent-plugin-mdsd (0.1.9.pre.build.dev) fluent-plugin-multi-format-parser (1.0.0) fluent-plugin-process-redfishalert (0.1.0) fluent-plugin-process-snmptrap (0.1.0) fluent-plugin-process-ucs-syslog (0.1.0) fluent-plugin-prometheus (1.8.5) fluent-plugin-prometheus_pushgateway (0.0.2) fluent-plugin-record-modifier (2.1.0) fluent-plugin-rewrite-tag-filter (2.3.0) fluent-plugin-route (1.0.0) fluent-plugin-s3 (1.5.1) fluent-plugin-sd-dns (0.1.0) fluent-plugin-servicebus-queue (0.1.0) fluent-plugin-snmptrapalert (0.1.0) fluent-plugin-systemd (1.0.2, 0.3.1) fluent-plugin-td (1.1.0) fluent-plugin-throttle (0.0.3) fluent-plugin-webhdfs (1.4.0) fluentd (1.12.1, 1.11.5, 0.12.43) forwardable (default: 1.3.1) getoptlong (default: 0.1.0) hirb (0.7.3) http (4.4.1) http-accept (1.7.0) http-cookie (1.0.3) http-form_data (2.3.0) http-parser (1.2.3) http_parser.rb (0.6.0) httpclient (2.8.2.4) io-console (default: 0.5.6) ipaddr (default: 1.2.2) irb (default: 1.2.6) jmespath (1.4.0) json (2.5.1, default: 2.3.0) jsonpath (1.1.0) kubeclient (4.9.1) logger (default: 1.4.2) lru_redux (1.1.0) ltsv (0.1.2) matrix (default: 0.2.0) mime-types (3.3.1) mime-types-data (3.2021.0212) mini_portile2 (2.5.0) minitest (5.13.0) msgpack (1.4.2) multi_json (1.15.0) multipart-post (2.1.1) prometheus-client (0.9.0) protocol-hpack (1.4.2) protocol-http (0.21.0) protocol-http1 (0.13.2) protocol-http2 (0.14.2) pstore (default: 0.1.0) psych (default: 3.1.0) public_suffix (4.0.6) quantile (0.2.1) racc (1.5.2, default: 1.4.16) rake (13.0.3, 13.0.1) rdkafka (0.8.1) rdoc (default: 6.2.1) readline (default: 0.0.2) recursive-open-struct (1.1.3) reline (default: 0.1.5) rest-client (2.1.0) rexml (default: 3.2.3) rss (default: 0.2.8) ruby-kafka (1.3.0) ruby-progressbar (1.11.0) ruby2_keywords (0.0.2) rubyzip (1.3.0) sdbm (default: 1.0.0) serverengine (2.2.3) sigdump (0.2.4) singleton (default: 0.1.0) snmp (1.2.0) string-scrub (0.0.5) stringio (default: 0.1.0) strptime (0.2.5) strscan (default: 1.0.3) systemd-journal (1.3.3) td (0.16.9) td-client (1.0.7) td-logger (0.3.27) test-unit (3.3.4) timeout (default: 0.1.0) timers (4.3.2) tracer (default: 0.1.0) tzinfo (2.0.4) tzinfo-data (1.2021.1) unf (0.1.4) unf_ext (0.0.7.7) uri (default: 0.10.0) webhdfs (0.9.0) webrick (1.7.0, default: 1.6.0) xmlrpc (0.3.0) yajl-ruby (1.4.1) yaml (default: 0.1.0) zip-zip (0.3) zlib (default: 1.1.0)

kenhys commented 3 years ago

I guess that <filter **> cause such a result because ** applies both of them. It may be better to use the exact match for docker log driver or containerd separately.

<filter **>
  @id containerd_concat
  @type concat
...
</filter>

<filter **>
  @id filter_concat
  @type concat
...
</filter>
sonnyhcl commented 3 years ago

@kenhys In my case, containerd_concat and filter_concat capture log from same workload group, but in different version kubernetes cluster. So I can't diff them with exact match label.

kenhys commented 3 years ago

Then, how about using rewrite tag filter plugin?, it can be distinguished by the timestamp and time key.

<match sample>
  @type rewrite_tag_filter
  <rule>
    key timestamp
    pattern ...
    tag docker.${tag}
  </rule>
  <rule>
    key time
    pattern ...
    tag containerd.${tag}
  </rule>
</match>