fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.89k stars 1.34k forks source link

Kubernetes container logs - duplicate logs found when using read_bytes_limit_per_second parameter #3464

Closed doge95 closed 3 years ago

doge95 commented 3 years ago

Describe the bug

Continue on https://github.com/fluent/fluentd/issues/3434. I followed @ashie suggestion and did the stress test again on our EFK stack with read_bytes_limit_per_second parameter and Fluentd version v1.13.2. However, I found that logs are duplicated in Elasticsearch.

To Reproduce

Container Runtime Version: containerd://1.4 Kubelet logging configuration: --container-log-max-files=50 --container-log-max-size=100Mi

# echo "GET http://172.16.5.154" | vegeta attack -rate=10000 -duration=120s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         1200000, 10000.01, 9999.99
Duration      [total, attack, wait]             2m0s, 2m0s, 289.975µs
Latencies     [min, mean, 50, 90, 95, 99, max]  119.123µs, 488.862µs, 280.767µs, 913.659µs, 1.335ms, 4.028ms, 61.799ms
Bytes In      [total, mean]                     0, 0.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      204:1200000
Error Set:

Expected behavior

Expect Kibana to also receive 1,200,000 logs. However, it receives 1,622,486 entires. Screen Shot 2021-07-21 at 5 16 41 PM

Your Environment

- Fluentd version: 1.13.2
- Operating system: Debian GNU/Linux 10 (buster)
- Kernel version: 5.4.0-72-generic

Your Configuration

Fluentd Chart values.yaml:

dashboards:
  enabled: "false"
image:
  tag: v1.13.2-debian-elasticsearch7-1.0

resources:
  limits:
    cpu: 400m
    memory: 400Mi
  requests:
    cpu: 200m
    memory: 200Mi
plugins:
- fluent-plugin-elasticsearch

nodeSelector:
  node-role.kubernetes.io/efktest: "true"

service:
  type: "ClusterIP"
  annotations: {}
  ports:
    - name: "aggregator"
      protocol: TCP
      containerPort: 24224

volumes:
- name: varlogcontainers
  hostPath:
    path: /var/log/containers
- name: varlogpods
  hostPath:
    path: /var/log/pods
- name: fluentpos
  hostPath:
    path: /var/log/fluent/
- name: etcfluentd-main
  configMap:
    name: fluentd-main
    defaultMode: 0777
- name: etcfluentd-config
  configMap:
    name: fluentd-config
    defaultMode: 0777
volumeMounts:
- name: varlogcontainers
  mountPath: /var/log/containers
  readOnly: true
- name: varlogpods
  mountPath: /var/log/pods
  readOnly: true
- name: fluentpos
  mountPath: /var/log/fluent/
- name: etcfluentd-main
  mountPath: /etc/fluent
- name: etcfluentd-config
  mountPath: /etc/fluent/config.d/

## Fluentd configurations:
fileConfigs:
  00_global.conf: |-
    <system>
      log_level debug
    </system>
  01_sources.conf: |-
    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluent/fluentd-containers.log.pos
      tag kubernetes.*
      exclude_path ["/var/log/containers/*_kube-system_*.log", "/var/log/containers/*_default_*.log", "/var/log/containers/*fluent*.log"]
      read_from_head true
      follow_inodes true
      refresh_interval 1
      read_bytes_limit_per_second 100000
      <parse>
        @type none
        message_key message
      </parse>
    </source>

    # Send the logs to the elasticsearch; one config for one namesapce
    <match kubernetes.** >
      @type elasticsearch_data_stream
      data_stream_name logs-efk-container
      hosts https://elasticsearch-client.default.svc.cluster.local:9200
      user elastic
      password xxxxxx
      ssl_verify false
      reload_connections false
      reconnect_on_error true
      reload_on_failure true
      log_es_400_reason false
      request_timeout 1000s
      include_timestamp true
      <buffer>
        @type memory
        flush_interval 3s
        retry_max_times 10
      </buffer>
    </match>
  02_filters.conf: |-
  03_dispatch.conf: |-
  04_outputs.conf: |-

Your Error Log

No error log

Additional context

When the log is rotated, 2 entries added to the pos file.

# ls -lrth
\total 167M
-rw-r----- 1 root root 103M Jul 21 16:47 0.log.20210721-164752
-rw-r----- 1 root root  47M Jul 21 16:48 0.log

Pos file:

# cat /var/log/fluent/fluentd-containers.log.pos
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log   00000000009c0246    000000006a4316ab
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log   00000000005fdfd8    000000006a4316ac
/var/log/containers/openresty-efk-stress-test-558464d485-wg7l5_efktest_openresty-7d2236f476b49cf828caec125558eb0724532481067814e85cf2c50b85754b3f.log   00000000005e3ffe    000000006a4316ac

One example of the duplicated log:

# grep 2021-07-21T16:47:52.958955667+08:00 *
0.log:2021-07-21T16:47:52.958955667+08:00 stdout F 172.24.0.69 - - [21/Jul/2021:16:47:52 +0800] "GET / HTTP/1.1" 204 0 "-" "Go-http-client/1.1"

The above log is found duplicated in Kibana. However, it only appears once in 0.log file. Screen Shot 2021-07-21 at 4 49 01 PM

doge95 commented 3 years ago
# wc -l *
   422486 0.log
   777520 0.log.20210721-164752
  1200006 total

It looks like all the logs in 0.log file are duplicated. 777,520+422,486*2-6(nginx starting logs)=1,622,486, which matches the observed number of log entries in Kibana.

doge95 commented 3 years ago

I test both read_bytes_limit_per_second=100000, 500000. Found the same issue of duplicated logs.

ashie commented 3 years ago

Thanks for your report. I've confirmed the bug.

3466 will fix it.

doge95 commented 3 years ago

Thanks for your report. I've confirmed the bug.

3466 will fix it.

Hey @ashie, thanks for the quick confirmation. May I ask when will it be released?

kenhys commented 3 years ago

@doge95 released as v1.13.3.

https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#bug-fix

doge95 commented 3 years ago

@doge95 released as v1.13.3.

https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#bug-fix

Hey @kenhys , thanks a lot for the quick fix! I have tested it out and not seeing this issue any more.