fluent / fluent-package-builder

td-agent (Fluentd) Building and Packaging System
Apache License 2.0
21 stars 23 forks source link

Add a patch for RubyInstaller to avoid crash on start up #620

Closed ashie closed 4 months ago

ashie commented 4 months ago

When a non-ASCII key exists under the registry key SOFTWARE/Microsoft/Windows/CurrentVersion/Uninstall/, Fluentd fails to start workers due to Encoding::UndefinedConversionError. This patch avoid this issue.

Fix #616

ashie commented 4 months ago

Hmm, this patch breaks searching MSYS2 for build :thinking:

ashie commented 4 months ago

I've confirmed that this patch fixes the issue.

ashie commented 4 months ago

We observed that supervisor process of fluentd is finished unexpectedly after about 1 hour passed while repeating recovery. While this situation, opened handles are continually increased, over than 8400 at last.

We got following backtrace on finishing supervisor process.

2024-02-21 15:34:23 +0900 [debug]: fluent/log.rb:341:debug: Got Win32 event "fluentd_7928_STOP_EVENT_THREAD"
Unexpected error undefined method `pid' for nil:NilClass
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:417:in `after_start'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:77:in `ensure in start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:77:in `start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:175:in `delayed_start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:159:in `restart_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:125:in `block in keepalive_workers'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each_with_index'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `keepalive_workers'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:58:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:50:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/server.rb:128:in `main'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:119:in `main'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:68:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:796:in `supervise'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:582:in `run_supervisor'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
  <internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  <internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
  C:/opt/fluent/bin/fluentd:32:in `load'
  C:/opt/fluent/bin/fluentd:32:in `<main>'

In the above log, the root cause is squashed by ensure. So I fetched additional backtrace of the exception:

#<Errno::EMFILE: Too many open files - dup>
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/process_manager.rb:190:in `spawn'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:413:in `spawn'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:75:in `start_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:175:in `delayed_start_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:159:in `restart_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:125:in `block in keepalive_workers'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each_with_index'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `keepalive_workers'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:58:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:50:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/server.rb:128:in `main'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:119:in `main'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:68:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:796:in `supervise'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:582:in `run_supervisor'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
<internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
<internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
C:/opt/fluent/bin/fluentd:32:in `load'
C:/opt/fluent/bin/fluentd:32:in `<main>'
ashie commented 4 months ago

We observed that supervisor process of fluentd is finished unexpectedly after about 1 hour passed while repeating recovery.

I filed a new issue for tracking this problem at ServerEngine's repository: https://github.com/treasure-data/serverengine/issues/145

kenhys commented 4 months ago

Before:

crashed because of problematic registry. image

After:

image

ashie commented 4 months ago

バッファロー らくらくアップデート!お前か!

daipom commented 2 months ago