emsearcy / fluent-plugin-gelf

Buffered fluentd output plugin to GELF (Graylog2)
Apache License 2.0
33 stars 57 forks source link

Fluentd fails to start if gelf tcp output cannot connect #28

Open ivan-mjch opened 7 years ago

ivan-mjch commented 7 years ago

When fluentd is configured with a tcp gelf output, that cannot connect, fluentd errors out, failing to start.

Centos 7 td-agent init scripts keep restarting fluentd, but it is not the expected behavior, as it halts all fluentd log processing and possibly results in message loss even in unrelated pipelines.

It may be related to #7, as fluentd core seems to be interacting improperly with the gelf library output connection. The problem may be in a difference between what fluentd expects and how gelf library behaves, that the plugin doesn't account for, but I don't know much about them.

2017-01-02 23:09:24 +0100 [error]: unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `new'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `create_sender'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:36:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `new'
  2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:138:in `block in start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:117:in `block (2 levels) in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:117:in `block (2 levels) in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `block in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:116:in `block in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:137:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:211:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:175:in `run'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:697:in `run_engine'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:454:in `block in run_worker'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `call'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `main_process'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:449:in `run_worker'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/command/fluentd.rb:288:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/bin/fluentd:5:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `load'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `load'
  2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `<main>'
2017-01-02 23:09:24 +0100 [error]: unexpected error error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
  2017-01-02 23:09:24 +0100 [error]: suppressed same stacktrace
pguridi commented 6 years ago

Hi @ivan-mjch , I also see this issue. I was trying shutting down Graylog to see if the retry works, and kind of worked:

2018-02-20 00:40:59 +0000 [warn]: #0 suppressed same stacktrace
2018-02-20 00:41:01 +0000 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2018-02-20 00:41:01 +0000 chunk="5659a1017df2404f87465b9399491e6f" error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"graylog-staging.xxx.com\" port 12202"

But then I tried restarting Fluentd for a different reason, and started to get the errors you posted. Seems that as long as Fluentd is not restarted, Graylog connection errors will not break it.

KevDBG commented 6 years ago

Hello !

The same problem here, we tried to secure our log. I try to explain, we use graylog to handle our logs so we use the gelf plugin.

We would like to send our logs to graylog in normal situation, and if graylog falls keep the logs in a buffer file during this downtime.

Currently our configuration seems to works:

<match **>
  @type copy
  <store>
    @type gelf
    host "IP_GRAYLOG"
    port XXXXX
    protocol tcp
    #flush_interval 5s
    flush_at_shutdown false
    <buffer>
      @type file
      path /var/log/td-agent/buffer/graylog_host
      flush_mode immediate
      retry_wait 1                       # The wait interval for the first retry.
      retry_exponential_backoff_base 1.1 # Inclease the wait time by a factor of N.
      retry_type exponential_backoff     # Set 'periodic' for constant intervals.
      retry_max_interval 5m              # Cap the wait interval. (see above)
      retry_randomize true               # Apply randomization. (see above)
      retry_timeout 72h                  # Maximum duration before giving up.
      # retry_max_times 17               # Maximum retry count before giving up.
      retry_forever false                # Set 'true' for infinite retry loops.
      retry_secondary_threshold 0.8      # See the "Secondary Output" section in
      total_limit_size 4GB               # The size limitation of this buffer plugin instance
#      compress gzip                      # The option to specify compression of each chunks, during events are buffered
    </buffer>
  </store>

</match>

But we have the same issue reported here. We stop td-agent/fluentd service, we stop graylog and we try to start again the td-agent/fluentd plugin.

When we do that, we have the same behavior (loop) :

2018-06-21 17:46:37 +0200 [error]: #0 unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"IP_GRAYLOG\" port XXXXX"
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `create_sender'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:37:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:165:in `block in start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:154:in `block (2 levels) in lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `each'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `block in lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `each'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:164:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:274:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:219:in `run'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:774:in `run_engine'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:523:in `block in run_worker'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:699:in `main_process'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:518:in `run_worker'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/command/fluentd.rb:316:in `<top (required)>'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/bin/fluentd:8:in `<top (required)>'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `load'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `<main>'

Any update on this issue ?

Many thanks,

KevDBG commented 6 years ago

Hello,

This project is dead ?

Regards,

emsearcy commented 6 years ago

I haven't had much time to work on this ... you may want to look at one of the forks for your immediate needs. I am planning to rename my "parent" repo into an organization so we can get more maintainers involved directly.

hleroy commented 5 years ago

Hi @emsearcy, did you have a chance to look at this issue? The fork fluent-plugin-gelf-hs has the same issue.

shinebayar-g commented 5 years ago

Sadly that fork disabled issues tab.. and this project isn't actively maintained :(

emsearcy commented 5 years ago

yeah, I wish I had more time! I'm also not using Graylog any more ... I still love Graylog and Fluentd—both awesome tools—but it's hard to put in the time on a project I'm not running and have no live environments to test with.