fluent-plugins-nursery / fluent-plugin-systemd

This is a fluentd input plugin. It reads logs from the systemd journal.
Apache License 2.0
153 stars 43 forks source link

Crash with Systemd::JournalError: Bad message #16

Closed Hexta closed 7 years ago

Hexta commented 7 years ago

Plugin with version 0.0.5 crashes on starting. Version 0.0.4 works fine. Backtrace

2016-11-10 12:42:33 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf" 2016-11-10 12:42:34 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"} 2016-11-10 12:42:34 +0000 [info]: Template configured and already installed. 2016-11-10 12:42:34 +0000 [error]: unexpected error error_class=Systemd::JournalError error=# 2016-11-10 12:42:34 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:284:in enumerate_helper' 2016-11-10 12:42:34 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:106:incurrent_entry' 2016-11-10 12:42:34 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:76:in watch' 2016-11-10 12:42:34 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:58:inrun' 2016-11-10 12:42:34 +0000 [warn]: unexpected error while shutting down input plugin plugin=Fluent::SystemdInput plugin_id="object:160bd68" error_class=Systemd::JournalError error=# 2016-11-10 12:42:34 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:284:in enumerate_helper' 2016-11-10 12:42:34 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:106:incurrent_entry' 2016-11-10 12:42:34 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:76:in watch' 2016-11-10 12:42:34 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-systemd-0.0.5/lib/fluent/plugin/in_systemd.rb:58:inrun' 2016-11-10 12:42:34 +0000 [warn]: process died within 1 second. exit.

errm commented 7 years ago

Hi I can't reproduce this, could you provide fluentd/td-agent version and any other information that might help track this down...

Do you see this error on all invocations of the plugin, or are particular messages in the journal an issue?

Hexta commented 7 years ago

td-agent v2.3.3 fluentd v0.12.29

But this error didn't happen in the last few days. I think that this error caused by very specific journal message. I will provide log entry if the error happened again.

errm commented 7 years ago

Super thanks, I will close for now, but if you encounter again we can reopen...

dannyk81 commented 7 years ago

Hey Guys,

I'm running fluentd-0.12.35 with fluent-plugin-systemd-v0.0.7 and hitting this error on a CoreOS node:

2017-08-02 18:36:12 +0000 [error]: unexpected error error_class=Systemd::JournalError error=#<Systemd::JournalError: Bad message>
  2017-08-02 18:36:12 +0000 [error]: /var/lib/gems/2.3.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:284:in `enumerate_helper'
  2017-08-02 18:36:12 +0000 [error]: /var/lib/gems/2.3.0/gems/systemd-journal-1.2.3/lib/systemd/journal.rb:106:in `current_entry'
  2017-08-02 18:36:12 +0000 [error]: /var/lib/gems/2.3.0/gems/fluent-plugin-systemd-0.0.7/lib/fluent/plugin/in_systemd.rb:89:in `watch'
  2017-08-02 18:36:12 +0000 [error]: /var/lib/gems/2.3.0/gems/fluent-plugin-systemd-0.0.7/lib/fluent/plugin/in_systemd.rb:71:in `run'
2017-08-02 18:36:12 +0000 [info]: shutting down fluentd

When I check the journal on this node, indeed there's some issue:

$ journalctl --verify
391830: Data object references invalid entry at a1c590
File corruption detected at /var/log/journal/c99d1e491c3c48c39f77268c6e972b98/system.journal:a1c270 (of 16777216 bytes, 63%).
FAIL: /var/log/journal/c99d1e491c3c48c39f77268c6e972b98/system.journal (Bad message)

I'm not sure what exactly is wrong with the journal, I tried vacuuming and rotating it, afterwards it was clean but the issue keeps creeping back.

Is it possible for the plugin to handle this without crashing? like skipping the bad event/message or something?

errm commented 7 years ago

Thanks for the report @dannyk81 it should be possible for us to catch this and move on...

dannyk81 commented 7 years ago

Thank you! would be awesome 👍

dannyk81 commented 7 years ago

@errm any chance to see this any time soon?

errm commented 7 years ago

Closed by #41

dannyk81 commented 6 years ago

Sorry to keep nagging about this ;)

But since we are still using fluentd 0.12.x (not ready to move to 0.14.x yet), could this fix be ported to the 0.0.x branch?