fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.82k stars 1.34k forks source link

Windows - Running Td-agent from command line with --daemon flag #4354

Open bennettfalkenberg opened 9 months ago

bennettfalkenberg commented 9 months ago

Describe the bug

When running td-agent from command line on Windows with the --daemon flag, we are getting the following error:

Traceback (most recent call last):
        7: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/daemon.rb:13:in `<main>'
        6: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:76:in `run_server'
        5: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:76:in `new'
        4: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:32:in `initialize'
        3: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/config_loader.rb:36:in `initialize'
        2: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/config_loader.rb:43:in `reload_config'
        1: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/daemon.rb:13:in `block in <main>'
C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/supervisor.rb:443:in `serverengine_config': undefined method `out' for nil:NilClass (NoMethodError)

The same run configuration works on Linux systems

To Reproduce

Run td-agent from command line with --daemon flag set to a pidfile. Our exact command:

td-agent --config=C:\opt\td-agent\etc\td-agent\td-agent.conf --log-rotate-age=100 --log-rotate-size=10000000 --daemon=C:\opt\td-agent\td-agent.pidfile

Expected behavior

for fluentd to properly launch with the pidfile specified

Your Environment

- Fluentd version: 1.16.2
- TD Agent version: 4.5.2
- Operating system: Windows 10

Your Configuration

<system>
  log_level trace
  <log>
    rotate_age 14
    rotate_size 1000000000
  </log>
</system>
@include collector.d/*.conf
<match logsource.**>
  @type xxx
  @id xxx
  @log_level info
  <endpoint>
    xxx
  </endpoint>
  slow_flush_log_threshold 10.0
  <buffer>
    @type file
    path C:/opt/td-agent/spool/
    retry_wait 10
    retry_exponential_backoff_base 2
    retry_type exponential_backoff
    retry_max_interval 120
    retry_randomize true
    retry_forever true
    chunk_limit_size 9.5MB
    total_limit_size 64GB
    overflow_action block
    flush_interval 60s
    flush_thread_count 10
  </buffer>
</match>

Your Error Log

Traceback (most recent call last):
        7: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/daemon.rb:13:in `<main>'
        6: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:76:in `run_server'
        5: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:76:in `new'
        4: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:32:in `initialize'
        3: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/config_loader.rb:36:in `initialize'
        2: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/serverengine-2.3.2/lib/serverengine/config_loader.rb:43:in `reload_config'
        1: from C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/daemon.rb:13:in `block in <main>'
C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.16.2/lib/fluent/supervisor.rb:443:in `serverengine_config': undefined method `out' for nil:NilClass (NoMethodError)

Additional context

Like said above, the command works on Linux systems just fine, just not on Windows systems for some reason

sean-scott-lr commented 9 months ago

I believe that this commit may have introduced this issue. https://github.com/fluent/fluentd/commit/afee4a4861e0e6e0c43cd021e55baac89e4a1d50#diff-6042a07157b33b2d729ab3d10ed6bba89ad5f71aaed2be1aeb22a179a45ebbae

sean-scott-lr commented 9 months ago

I see a number of issues that are "waiting-for-triage", just curious how often the community checks in on these issues?

coffee5280 commented 9 months ago

We have found a resolution - Re-adding old code from a previous commit has resolved the issue.

This class needed to be re-added to Supervisor.rb: class LoggerInitializer

This allows the $log variable to be properly initialized, and the error when running --daemon flag no longer occurs.

A real fix to this would be great, i'm sure we are adding back a lot of dead code here.

ashie commented 9 months ago

Thanks for your report! We'll check this. BTW running as a service is recommended on Windows instead of --daemon option: https://docs.fluentd.org/installation/install-by-msi#step-5-run-fluentd-as-windows-service

daipom commented 6 months ago

I'm sorry. This breaks Windows --daemon feature since v1.16.0.

This should be fixed.

daipom commented 6 months ago

From #4065

Supervisor.load_config() is called in ServerEngine's reloading function, but Fluentd doesn't use the function even when SIGHUP or SIGUSR2. So I can't see the reason for initializing the logger in that callback.

This is wrong. It was used daemon.rb, which is used only for Windows daemon.

https://github.com/fluent/fluentd/blob/e89092ce1132a933c12bb23fe8c9323c07ca81f5/lib/fluent/daemon.rb#L15

I may have overlooked the Windows daemon use case, as I assumed it was supposed to use Windows services on Windows. However, I can confirm Windows daemon seems to work correctly on v1.15.3.

We need to fix this so that it works again.

daipom commented 6 months ago

This is not ready for v1.16.4.

coffee5280 commented 5 months ago

@daipom Thank you for the attention to this bug!