fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.9k stars 1.34k forks source link

Last buffered log not flushed immediately #3400

Closed kristoflarcher closed 3 years ago

kristoflarcher commented 3 years ago

Describe the bug

Hi, We have an issue with td-agent buffer : it seems the last log contained in buffer is not flushed immediately whereas i set flush-mode to immediate

To Reproduce

I have a td-agent for test-purpose on which I reproduced the same issue: I have a logger that wrote 8 logs parsed with my td-agent --> and i get 7 logs in ELK To get the last log flushed --> i have to restart td-agent OR I push new logs via my logger I reproduce the issue with td-agent v3.8.1 and latest td-agent 4.1.1

Expected behavior

How can I get ALL the logs to be flushed immediately from buffer ?

Your Environment Environment: CentOS-7, td-agent v4.1.1, kernel 3.10.0-1160.2.2.el7.x86_64

Your Configuration

@type elasticsearch index_name time_key timestamp include_timestamp true user elk_user password elk_password hosts ELK ca_file /etc/td-agent/certs/ca_elasticsearch.crt scheme https ssl_verify false ssl_version TLSv1_2 logstash_format false include_tag_key true tag_key @log_name

flush_mode immediate flush_at_shutdown true retry_type exponential_backoff retry_exponential_backoff_base 2 retry_forever false flush_thread_count 1 chunk_limit_size 8M total_limit_size 1G queue_limit_length 8 overflow_action block

Your Error Log There are no errors in logs

Additional context

kristoflarcher commented 3 years ago

Is there any body in ?

ashie commented 3 years ago

I can't reproduce it with forward plugin. Fluentd version: https://github.com/fluent/fluentd/tree/87164893957bc38b99f42222027ae2a6bc436eec

in_foward config

<source>
  @type forward
</source>

<match **>
  @type stdout
</match>

out_forward config

<source>
  @type tail
  path /path/to/test.log
  pos_file /path/to/test.pos
  read_from_head
  refresh_interval 1
  tag foo.bar
  format json
</source>

<match **>
  @type forward
  <server>
    host localhost
    port 24224
  </server>

  # copied from your config
  <buffer>
    flush_mode            immediate
    flush_at_shutdown     true
    retry_type            exponential_backoff
    retry_exponential_backoff_base 2
    retry_forever         false
    flush_thread_count    1
    chunk_limit_size      8M
    total_limit_size      1G
    queue_limit_length    8
    overflow_action       block
  </buffer>
</match>

New logs are shown at in_forward side immediately, so I don't think it's fluentd's buffer issue.

Could you describe your complete config & steps to reproduce? In addition, please describe versions of td-agent and each components.

ashie commented 3 years ago

And you can check buffer status by monitor_agent: https://docs.fluentd.org/input/monitor_agent

kristoflarcher commented 3 years ago

Hello Ashie,

Environment: CentOS-7, td-agent v4.1.1, kernel 3.10.0-1160.2.2.el7.x86_64 (Downloaded at this address : https://packages.treasuredata.com.s3.amazonaws.com/4/redhat/7/x86_64/td-agent-4.0.0-1.el7.x86_64.rpm )

kristoflarcher commented 3 years ago

Gems installed 👍 LOCAL GEMS

addressable (2.7.0) async (1.24.2) async-http (0.50.8) async-io (1.27.7) async-pool (0.2.0) aws-eventstream (1.1.0) aws-partitions (1.332.0) aws-sdk-core (3.100.0) aws-sdk-kms (1.34.1) aws-sdk-s3 (1.69.1) aws-sdk-sqs (1.28.0) aws-sigv4 (1.2.0) bigdecimal (default: 1.3.2) bundler (1.16.6) concurrent-ruby (1.1.6) console (1.8.2) cool.io (1.6.0) did_you_mean (1.1.0) digest-crc (0.5.1) elasticsearch (6.8.2) elasticsearch-api (6.8.2) elasticsearch-transport (6.8.2) excon (0.75.0) faraday (1.0.1) ffi (1.13.1) fluent-config-regexp-type (1.0.0) fluent-logger (0.8.2) fluent-plugin-concat (2.4.0) fluent-plugin-elasticsearch (4.0.10, 4.0.9) fluent-plugin-grok-parser (2.6.2, 2.6.1) fluent-plugin-kafka (0.13.0) fluent-plugin-loomsystems (0.0.1) fluent-plugin-multi-format-parser (1.0.0) fluent-plugin-out-http (1.3.3) fluent-plugin-prometheus (1.8.0) fluent-plugin-prometheus_pushgateway (0.0.2) fluent-plugin-record-modifier (2.1.0) fluent-plugin-rewrite-tag-filter (2.3.0) fluent-plugin-s3 (1.3.2) fluent-plugin-secure-forward (0.4.5) fluent-plugin-serialize-nested-json (0.0.2) fluent-plugin-systemd (1.0.2) fluent-plugin-td (1.1.0) fluent-plugin-td-monitoring (0.2.4) fluent-plugin-webhdfs (1.2.5) fluentd (1.11.1) hirb (0.7.3) http_parser.rb (0.6.0) httpclient (2.8.2.4) io-console (default: 0.4.6) ipaddress (0.8.3) jmespath (1.4.0) json (default: 2.0.4) ltsv (0.1.2) mini_portile2 (2.4.0) minitest (5.10.1) mixlib-cli (1.7.0) mixlib-config (2.2.4) mixlib-log (1.7.1) mixlib-shellout (2.2.7) msgpack (1.3.3) multi_json (1.14.1) multipart-post (2.1.1) net-telnet (0.1.1) nio4r (2.5.2) nokogiri (1.10.9) ohai (6.20.0) oj (3.8.1) openssl (default: 2.0.9) parallel (1.19.2) power_assert (0.4.1) prometheus-client (0.9.0) protocol-hpack (1.4.2) protocol-http (0.15.1) protocol-http1 (0.10.3) protocol-http2 (0.11.6) proxifier (1.0.3) psych (default: 2.2.2) public_suffix (4.0.5) quantile (0.2.1) rake (13.0.1, 12.0.0) rdkafka (0.8.0) rdoc (default: 5.0.1) resolve-hostname (0.1.0) ruby-kafka (1.1.0) ruby-progressbar (1.10.1) rubyzip (1.3.0) serverengine (2.2.1) sigdump (0.2.4) strptime (0.2.4) systemd-journal (1.3.3) systemu (2.5.2) td (0.16.9) td-client (1.0.7) td-logger (0.3.27) test-unit (3.2.3) timers (4.3.0) tzinfo (2.0.2) tzinfo-data (1.2020.1) webhdfs (0.9.0) xmlrpc (0.2.1) yajl-ruby (1.4.1) zip-zip (0.3)

kristoflarcher commented 3 years ago

I attach my confiuguration file ( zipped ) : td-agent.zip

kristoflarcher commented 3 years ago

In my test i injected n logs and i get (n-1) logs in ELK

I follow your advice and i used the monitor-agent ( thanks -->: its really usefull ) : In this test case : i injected exactly 40 logs and i get 39 emit_records curl http://s999lc2elk02.concept.gfdi.be:24220/api/plugins/ plugin_id:object:3fb3094e6104 plugin_category:input type:monitor_agent output_plugin:false retry_count: plugin_id:input_tail_jade2_web plugin_category:input type:tail output_plugin:false retry_count: plugin_id:input_tail_jade2_web_localhost_access plugin_category:input type:tail output_plugin:false retry_count: plugin_id:object:3fb30a30ea9c plugin_category:output type:copy output_plugin:false retry_count:0 plugin_id:object:3fb309dfcc20 plugin_category:output type:elasticsearch output_plugin:true buffer_queue_length:1 buffer_timekeys: buffer_total_queued_size:118050 retry_count:0 emit_records:39 emit_count:32 write_count:1 rollback_count:0 slow_flush_count:0 flush_time_count:419 buffer_stage_length:0 buffer_stage_byte_size:0 buffer_queue_byte_size:118050 buffer_available_buffer_space_ratios:100.0 plugin_id:object:3fb309df7d4c plugin_category:filter type:record_transformer output_plugin:false retry_count: plugin_id:object:3fb309df36e8 plugin_category:filter type:record_transformer output_plugin:false retry_count: plugin_id:object:3fb309d7e514 plugin_category:filter type:record_transformer output_plugin:false retry_count: plugin_id:object:3fb309d77804 plugin_category:filter type:record_transformer output_plugin:false retry_count: plugin_id:object:3fb309d6dfc0 plugin_category:filter type:record_transformer output_plugin:false retry_count: plugin_id:object:3fb309d66dd8 plugin_category:filter type:record_transformer output_plugin:false retry_count:

kristoflarcher commented 3 years ago

I think you are absolutely right : its not a buffer issue Its related to my mutltiline configuration : i found tis post :: https://groups.google.com/g/fluentd/c/OsDMg1fCEJc

In almost cases, mulitline mode can't judge the end of log so format_firstline is needed for trigger to emit events. At shutdown phase, buffered multiline logs are flushed.

So i just wonder, how i can flush the last log using format_firstline

kristoflarcher commented 3 years ago

If you have a suggestion....I would be glad to listen your precious sugggestions. Thanks for your help Ashie.

ashie commented 3 years ago

How about using multiline_flush_interval?

I close this issue because it's not a bug.