fluent / fluent-plugin-windows-eventlog

Fluentd plugin to collect windows event logs
Apache License 2.0
33 stars 19 forks source link

Security Channel log -- ErrorCode: 14 #73

Open sophieyfang opened 3 years ago

sophieyfang commented 3 years ago

One of our clients encountered this error. (Sorry that i can't reproduce it in my side.) The client has increased the memory from 8GB to 16 GB and now 32GB. But still it got errorCode:14 which is Out of Memory error code in Windows. Also It affects only Security channel. All other channels' logs flow in without trouble.. I looked through the code and this is the stack-trace that i think it might be happening during the crash(only security channel crashed). I wonder how could one single eventlog entry consumed that much memory considering the client has increased it from 8->16->32GB but still sees the error. Alongside there was always a memory spike when the crash happened

(Note: ordering is in invoking sequence)

Error Message:

  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluent-plugin-windows-eventlog-0.8.0/lib/fluent/plugin/in_windows_eventlog2.rb:322:in `each'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluent-plugin-windows-eventlog-0.8.0/lib/fluent/plugin/in_windows_eventlog2.rb:322:in `on_notify_hash'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluent-plugin-windows-eventlog-0.8.0/lib/fluent/plugin/in_windows_eventlog2.rb:249:in `block (2 levels) in subscribe_channels'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-12-04 11:07:25 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
cosmo0920 commented 3 years ago

This seems to be occurred with Out of Memory:

ERROR_OUTOFMEMORY

    14 (0xE)

    Not enough storage is available to complete this operation.

ref: https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-

rate_limit might be useful to prevent flood of consuming events in Security channel.

sophieyfang commented 3 years ago

In the ReadMe.md, it describes ‘rate_limit’ as a way to

“”” NOTE: If you encountered CPU spike due to massively huge EventLog channel, rate_limit parameter may help you. Currently, this paramter can handle the multiples of 10 or -1(Winevt::EventLog::Subscribe::RATE_INFINITE). “””. It seems for CPU spike but not memory spike. But, I will ask client to give it a try.

sophieyfang commented 3 years ago

Customer is generating > 2000 entries per second on the security channel. With rate_limit 200, It still crashed with exception stated below. I can confirm that log.warn e.inspect is added inside C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer.rb. But the logged exception doesn't give more useful info...

Also since customer uses rate_limit 200, the > 2000 entries per second will probably cause a lot of backlog in the file buffers. But how could it cause the below exception is something i don't understand. I assume below exception is due to out of memory.

2021-02-08 01:41:47 -0800 [warn]: #0 emit transaction failed: error_class=IOError error="closed stream" location="C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'" tag="winevt.raw"
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer/file_chunk.rb:82:in `pos'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer/file_chunk.rb:82:in `rollback'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer.rb:356:in `rescue in block in write'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer.rb:348:in `block in write'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer.rb:347:in `each'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/buffer.rb:347:in `write'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/compat/output.rb:374:in `block in handle_stream_simple'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/output.rb:888:in `write_guard'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/compat/output.rb:373:in `handle_stream_simple'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/output.rb:878:in `execute_chunking'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin/output.rb:808:in `emit_buffered'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/event_router.rb:160:in `emit_events'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/event_router.rb:97:in `emit_stream'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluent-plugin-windows-eventlog-0.8.0/lib/fluent/plugin/in_windows_eventlog2.rb:341:in `on_notify_hash'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluent-plugin-windows-eventlog-0.8.0/lib/fluent/plugin/in_windows_eventlog2.rb:249:in `block (2 levels) in subscribe_channels'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run_once'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2021-02-08 01:41:47 -0800 [warn]: #0 C:/Program Files (x86)/Stackdriver/LoggingAgent/Main/lib/ruby/gems/2.6.0/gems/fluentd-1.11.2-x86-mingw32/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-02-08 01:41:47 -0800 [error]: #0 Unexpected error raised. Stopping the timer. title=:in_windows_eventlog_security error_class=IOError error="closed stream"
  2021-02-08 01:41:47 -0800 [error]: #0 suppressed same stacktrace
2021-02-08 01:41:47 -0800 [error]: #0 Timer detached. title=:in_windows_eventlog_security

Here is the conf:

2021-02-05 14:52:42 -0800 [info]: using configuration file: <ROOT>
  <filter winevt.raw>
    @type record_transformer
    enable_ruby 
    <record>
      message ${record['Description']}
      severity info
      timestamp ${t = Time.parse(record['TimeCreated']); {'seconds' => t.tv_sec, 'nanos' => t.tv_nsec}}
    </record>
  </filter>
  <filter **>
    @type add_insert_ids
  </filter>
  <match **>
    @type google_cloud
    buffer_type "file"
    buffer_path /var/log/google-fluentd/buffers
    buffer_chunk_limit 512KB
    flush_interval 5s
    disable_retry_limit false
    retry_limit 3
    retry_wait 10
    max_retry_wait 300
    num_threads 8
    use_grpc false
    partial_success true
    <buffer>
      flush_mode interval
      retry_type exponential_backoff
      @type file
      path /var/log/google-fluentd/buffers
      flush_thread_count 8
      flush_interval 5s
      retry_forever false
      retry_max_times 3
      retry_max_interval 300
      chunk_limit_size 512KB
    </buffer>
  </match>
  <source>
    @type windows_eventlog2
    channels application,system,security
    read_interval 2
    rate_limit 200
    tag "winevt.raw"
    <storage>
      @type "local"
      persistent true
      path ".\\Main\\pos\\winevtlog.pos"
    </storage>
  </source>
</ROOT>
sophieyfang commented 3 years ago

Can you help us understanding how rate_limit control the pace of ingesting eventlogs? With great load of eventlog generation, how would rate_limit help in such case? What side effect it might cause in the memory or cpu usage?

ashie commented 3 years ago

https://github.com/fluent-plugins-nursery/winevt_c/blob/8b1dbcde4de781bab641873aea1995876a73d0c5/ext/winevt/winevt_subscribe.c#L289-L321

cosmo0920 commented 3 years ago

What side effect it might cause in the memory or cpu usage?

rate limit can cause consuming pause and this will cause CPU and memory usage decline. With rate_limit parameter, EvtNext is not called when rate limit is exceeded. On users side, just no events consumed phenomenon is observed.

cosmo0920 commented 3 years ago

Also since customer uses rate_limit 200, the > 2000 entries per second will probably cause a lot of backlog in the file buffers.

This shouldn't be occurred. I guess just discarded Windows EventLog which are not consumed from Fluentd Windows EventLog plugin. Windows EventLog is always written in capped and rotated channels.