fluent-plugins-nursery / fluent-plugin-systemd

This is a fluentd input plugin. It reads logs from the systemd journal.
Apache License 2.0
153 stars 43 forks source link

fluentd crashes if journal path is not available #24

Closed dannyk81 closed 7 years ago

dannyk81 commented 7 years ago

Using fluent-plugin-systemd version 0.0.5 on fluentd-0.12.31 (Dockerized in K8s environment)

When starting fluentd with this plugin, in case the journal path is not available on the server it causes fluentd to crash.

2017-01-13 14:13:19 +0000 [info]: adding source type="systemd"
2017-01-13 14:13:19 +0000 [error]: unexpected error error="No such file or directory"
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:52:in `initialize'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:21:in `new'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:21:in `configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:154:in `add_source'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:95:in `block in configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:92:in `each'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/root_agent.rb:92:in `configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:129:in `configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/engine.rb:103:in `run_configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:489:in `run_configure'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:160:in `block in start'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:366:in `call'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:366:in `main_process'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:339:in `block in supervise'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `fork'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:338:in `supervise'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/supervisor.rb:156:in `start'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/lib/fluent/command/fluentd.rb:173:in `<top (required)>'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.31/bin/fluentd:5:in `<top (required)>'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `load'
  2017-01-13 14:13:19 +0000 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
  2017-01-13 14:13:19 +0000 [error]: /usr/sbin/td-agent:7:in `load'
  2017-01-13 14:13:19 +0000 [error]: /usr/sbin/td-agent:7:in `<main>'
2017-01-13 14:13:19 +0000 [info]: process finished code=256
2017-01-13 14:13:19 +0000 [warn]: process died within 1 second. exit.

Any way to handle it more gracefully ?

errm commented 7 years ago

I am not sure what the best way to handle this would be ....

How do other plugins handle this sort of fatal error?

There are two real options to handle this

dannyk81 commented 7 years ago

I think it should log a warning and retry to access, similar approach that the tail plugin uses (I believe):

2017-01-10 03:38:11 +0000 [warn]: /var/log/xyz.log unreadable. It is excluded and would be examined next time.
errm commented 7 years ago

I could go for that, doing the same thing as the tail plugin makes a lot of sense.

Did you wan't to have a bash at this @dannyk81 ? Otherwise it will go onto my list :)

dannyk81 commented 7 years ago

I could try :) but no much experience with Ruby I'm afraid...

By the way, slightly unrelated... the new 0.1.0 requires fluentd 0.14.11 ?

cosmo0920 commented 7 years ago

By the way, slightly unrelated... the new 0.1.0 requires fluentd 0.14.11 ?

Yes. If you are still using Fluentd 0.12.x, please use fluent-plugin-systemd 0.0.5.

see: https://github.com/reevoo/fluent-plugin-systemd/pull/25/files#diff-04c6e90faac2675aa89e2176d2eec7d8R11 see: https://github.com/reevoo/fluent-plugin-systemd/blob/fa5f1ebd595ee56395e5c7a270bb1113e62e0d05/fluent-plugin-systemd.gemspec#L26

cosmo0920 commented 7 years ago

Just my thought, Fluentd and its plugins should detect any suspicious and erroneous settings in #configure instead of after #start. Because it helps users to notice errors and warnings before launching completely.

errm commented 7 years ago

By the way, slightly unrelated... the new 0.1.0 requires fluentd 0.14.11 ?

I have just cut a 0.0.x branch for maintenance of a v0.12 compatible version hopefully we can continue to backport stuff there at least until td-agent is based on v0.14

dannyk81 commented 7 years ago

I have just cut a 0.0.x branch for maintenance of a v0.12 compatible version hopefully we can continue to backport stuff there at least until td-agent is based on v0.14

Thanks! that would be a awesome.

cosmo0920 commented 7 years ago

@errm I've found that this line should be ~> 0.12.0 in 0.0.x branch. Because rubygems interprets it as >= 0.12.0 && < 1.0. The right way to specify locking v0.12.x version is ~> 0.12.0.

errm commented 7 years ago

^^ I think you are right, but it shouldn't cause too much harm since 0.0.x should still work ok on v0.14

dannyk81 commented 7 years ago

Awesome!!! Thanks for this :+1: