fluent-plugins-nursery / fluent-plugin-systemd

This is a fluentd input plugin. It reads logs from the systemd journal.
Apache License 2.0
153 stars 43 forks source link

Segmentation fault on Ubuntu 22.04 #110

Closed kthhrv closed 2 weeks ago

kthhrv commented 8 months ago

Works on 20.04 but on 22.04 fluentd silently fails.

I'm using latest fluentd

# apt show fluent-package
Package: fluent-package
Version: 5.0.2-1

running fluentd manually with

# FLUENT_SOCKET=/var/run/fluent/fluentd.sock FLUENT_PLUGIN=/etc/fluent/plugin LD_PRELOAD=/opt/fluent/lib/libjemalloc.so GEM_PATH=/opt/fluent/lib/ruby/gems/3.2.0/ GEM_HOME=/opt/fluent/lib/ruby/gems/3.2.0/ FLUENT_CONF=/etc/fluent/fluentd.conf /opt/fluent/bin/fluentd

I get

/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325: [BUG] Segmentation fault at 0x0000000000000008
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0013 p:---- s:0064 e:000063 CFUNC  :free
c:0012 p:0012 s:0059 e:000058 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
c:0011 p:0042 s:0053 e:000052 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
c:0010 p:0018 s:0047 e:000044 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
c:0009 p:0013 s:0040 e:000039 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144
c:0008 p:0032 s:0034 e:000033 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121 [FINISH]
c:0007 p:---- s:0030 e:000029 IFUNC 
c:0006 p:0012 s:0027 e:000026 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80 [FINISH]
c:0005 p:---- s:0022 e:000021 CFUNC  :run_once
c:0004 p:0034 s:0017 e:000016 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88
c:0003 p:0026 s:0012 e:000011 BLOCK  /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93
c:0002 p:0050 s:0008 e:000007 BLOCK  /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78 [FINISH]
c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run'
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run_once'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121:in `run'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144:in `watch'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151:in `yield_current_entry'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13:in `cursor'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325:in `read_and_free_outstr'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325:in `free'

-- Machine register context ------------------------------------------------
 RIP: 0x00007f88714a2e50 RBP: 0x00007f8866c267f0 RSP: 0x00007f8864cf8b90
 RAX: 0x0000000000000000 RBX: 0x00007f886a574b40 RCX: 0x000007f886a574b2
 RDX: 0x0000000000000001 RDI: 0x00007f887161ac80 RSI: 0x00007f885c0008e0
  R8: 0x00000000ffffffff  R9: 0x00007f885c0008e0 R10: 0x0000000000004022
 R11: 0x00007f885c000090 R12: 0x00007f887161ac80 R13: 0x0000000000000000
 R14: 0x0000ff10d119b330 R15: 0x00007f886b32a110 EFL: 0x0000000000010246

-- C level backtrace information -------------------------------------------
/opt/fluent/lib/libruby.so.3.2(0x7f8871ac1539) [0x7f8871ac1539]
/opt/fluent/lib/libruby.so.3.2(0x7f8871ac5869) [0x7f8871ac5869]
/opt/fluent/lib/libruby.so.3.2(0x7f887191e384) [0x7f887191e384]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a3e65d) [0x7f8871a3e65d]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871442520) [0x7f8871442520]
/lib/x86_64-linux-gnu/libc.so.6(0x7f88714a2e50) [0x7f88714a2e50]
/lib/x86_64-linux-gnu/libc.so.6(free+0x73) [0x7f88714a5453]
/lib/x86_64-linux-gnu/libffi.so.8(0x7f886b328e2e) [0x7f886b328e2e]
/lib/x86_64-linux-gnu/libffi.so.8(0x7f886b325493) [0x7f886b325493]
/opt/fluent/lib/ruby/gems/3.2.0/gems/ffi-1.15.5/lib/ffi_c.so(rbffi_CallFunction+0x1af) [0x7f886976229f]
/opt/fluent/lib/ruby/gems/3.2.0/gems/ffi-1.15.5/lib/ffi_c.so(0x7f8869758937) [0x7f8869758937]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa1fdb) [0x7f8871aa1fdb]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_vm_call0+0xcd) [0x7f8871ab250d]
/opt/fluent/lib/libruby.so.3.2(rb_method_call_with_block_kw+0x84) [0x7f88719eb254]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa27f4) [0x7f8871aa27f4]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa289f) [0x7f8871aa289f]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_funcallv+0x19c) [0x7f8871ab3c8c]
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io_ext.so(0x7f886c17362d) [0x7f886c17362d]
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io_ext.so(0x7f886c1787c2) [0x7f886c1787c2]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa1fdb) [0x7f8871aa1fdb]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_vm_invoke_proc+0x5f) [0x7f8871abf97f]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a7450a) [0x7f8871a7450a]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a74de6) [0x7f8871a74de6]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a75a32) [0x7f8871a75a32]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871494ac3) [0x7f8871494ac3]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871526850) [0x7f8871526850]

config

# cat /etc/fluent/conf.d/etl.conf 
<source>
    @type systemd
    tag raw.etl
    path /var/log/journal
    matches [{ "_SYSTEMD_UNIT": "etl.service" }, { "_SYSTEMD_UNIT": "heartbeat.service" }]
    read_from_head true

    <storage>
       @type local
       path /var/log/fluentd-journald-kubelet-cursor.json
    </storage>

    <entry>
       fields_strip_underscores true
       fields_lowercase true
    </entry>
</source>

<match etl>
    @type cloudwatch_logs
    log_group_name "/timeout/etl/#{ENV['TREEHUGGER_STAGE'] || 'prod'}/app"
    log_stream_name "#{ENV['INSTANCE_ID']}"
    auto_create_stream true
    flush_interval 1s
</match>

<system>
  root_dir /var/log/fluent
</system>
root@ip-10-200-4-48:~# 
redliu312 commented 8 months ago

met the same issue on a k8s cluster and trying to use fluentd image: fluent/fluentd-kubernetes-daemonset:v1.16.3-debian-elasticsearch8-2.1 with systemd plugin

the error :

fluentd-sbpzp fluentd 2024-01-28 08:20:57 +0000 [trace]: #0 [es_kubelet] writing events into buffer instance=2360 metadata_size=1
fluentd-sbpzp fluentd /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325: [BUG] Segmentation fault at 0x0000000000000008
fluentd-sbpzp fluentd ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
fluentd-sbpzp fluentd
fluentd-sbpzp fluentd -- Control frame information -----------------------------------------------
fluentd-sbpzp fluentd c:0013 p:---- s:0064 e:000063 CFUNC  :free
fluentd-sbpzp fluentd c:0012 p:0012 s:0059 e:000058 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
fluentd-sbpzp fluentd c:0011 p:0042 s:0053 e:000052 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
fluentd-sbpzp fluentd c:0010 p:0018 s:0047 e:000044 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
fluentd-sbpzp fluentd c:0009 p:0013 s:0040 e:000039 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144
fluentd-sbpzp fluentd c:0008 p:0032 s:0034 e:000033 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121 [FINISH]
fluentd-sbpzp fluentd c:0007 p:---- s:0030 e:000029 IFUNC
fluentd-sbpzp fluentd c:0006 p:0012 s:0027 e:000026 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80 [FINISH]
fluentd-sbpzp fluentd c:0005 p:---- s:0022 e:000021 CFUNC  :run_once
fluentd-sbpzp fluentd c:0004 p:0034 s:0017 e:000016 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88
fluentd-sbpzp fluentd c:0003 p:0026 s:0012 e:000011 BLOCK  /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93
fluentd-sbpzp fluentd c:0002 p:0050 s:0008 e:000007 BLOCK  /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78 [FINISH]
fluentd-sbpzp fluentd c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]
fluentd-sbpzp fluentd
fluentd-sbpzp fluentd -- Ruby level backtrace information ----------------------------------------
dkulchinsky commented 7 months ago

looks like related to https://github.com/ledbettj/systemd-journal/issues/93

and also reported here: https://github.com/fluent/fluent-package-builder/issues/369

fpiriz-inditex commented 7 months ago

Workaround suggested in https://github.com/fluent/fluent-package-builder/issues/369 does remove the crashes. However, in my case (same as redliu312 in the previous message) systemd logs are not read with the workaround. Leaving the env variable untouched, fluentd crashes and upon restart, all the systemd logs are flushed and indexed.

daipom commented 6 months ago

Currently, we need to avoid this problem by unsetting LD_PRELOAD in some environments.

It is probably not a problem with this plugin, but it seems to be a problem with libjemalloc and systemd-journal.

It's a problem to have a segmentation fault in the default state of some Fluentd distributions, so we need to consider possible solutions.

ledbettj commented 1 month ago

Hi, this issue is discussed over at https://github.com/ledbettj/systemd-journal/pull/96

The problem is that the native library, libsystemd allocates memory using the system allocator, and then expects the calling code to free it. In most libraries, when they allocate memory they have a corresponding _free() method which can be called to release it, but libsystemd does not in this case.

The caller must call the corresponding free() implementation that was used for malloc() -- libc, tcmalloc, jemalloc, etc -- otherwise the process will likely crash.

In Ruby/FFI land, I don't know any good way to determine which allocator was used by the native code. Trying to load the jemalloc library when the libC implementation is already in use will cause issues :(

One option might be to provide a separate native library shim that just exposes a free wrapper (which would hopefully pick up on the LD_PRELOAD) but I don't know if that will be foolproof either.

ledbettj commented 2 weeks ago

Hi, this issue is discussed over at ledbettj/systemd-journal#96

The problem is that the native library, libsystemd allocates memory using the system allocator, and then expects the calling code to free it. In most libraries, when they allocate memory they have a corresponding _free() method which can be called to release it, but libsystemd does not in this case.

The caller must call the corresponding free() implementation that was used for malloc() -- libc, tcmalloc, jemalloc, etc -- otherwise the process will likely crash.

In Ruby/FFI land, I don't know any good way to determine which allocator was used by the native code. Trying to load the jemalloc library when the libC implementation is already in use will cause issues :(

One option might be to provide a separate native library shim that just exposes a free wrapper (which would hopefully pick up on the LD_PRELOAD) but I don't know if that will be foolproof either.

Hi friends,

I've published a version 2.0.0 of systemd-journal gem which includes a native shim as described above to attempt to work around this allocator mismatch.

Please give it a try and let me know if you run into any issues.

kenhys commented 2 weeks ago

It seems that this issue was fixed in systemd-journal 2.0.0, so we shipped fluent-plugin-systemd 1.1.0 which adopts systemd-journal 2.0.0 or later.

Please use fluent-plugin-systemd 1.1.0.