fluent / fluentd-docker-image

Docker image for Fluentd
https://hub.docker.com/r/fluent/fluentd/
Apache License 2.0
462 stars 354 forks source link

Segmentation fault after updating from version 1.16.3 to 1.16.4 #378

Open HolgerHees opened 4 months ago

HolgerHees commented 4 months ago

Describe the bug

After updating the official docker container from fluent/fluentd:v1.16.3 to fluent/fluentd:v1.16.4 I got a segmentation fault during startup which end in a endless starting loop.

Additionally I have the following gem modules installed

To Reproduce

just update and restart

Expected behavior

should not crash

Your Environment

- Fluentd version: 1.16.4
- TD Agent version:
- Operating system: Docker with opensuse leap 15.5
- Kernel version: 5.14.21-150500.55.52-default

Your Configuration

as it happens in journald, I write my journald configuration part here

<source>
  @type systemd
  tag systemd
  path /var/log/journal
  matches [{ "PRIORITY": [0,1,2,3,4,5,6] }]
  <storage>
    @type local
    persistent false
    path systemd.pos
  </storage>
  <entry>
    fields_strip_underscores true
    fields_lowercase true
    #field_map {"MESSAGE": "log", "_PID": ["process", "pid"], "_CMDLINE": "process", "_COMM": "cmd"}
  </entry>
</source>

Your Error Log

Mär 23 09:14:12 marvin fluentd[1519]: /usr/local/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325: [BUG] Segmentation fault at 0x0000000000000008
Mär 23 09:14:12 marvin fluentd[1519]: ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
Mär 23 09:14:12 marvin fluentd[1519]: 
Mär 23 09:14:12 marvin fluentd[1519]: -- Control frame information -----------------------------------------------
Mär 23 09:14:12 marvin fluentd[1519]: c:0013 p:---- s:0064 e:000063 CFUNC  :free
Mär 23 09:14:12 marvin fluentd[1519]: c:0012 p:0012 s:0059 e:000058 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
Mär 23 09:14:12 marvin fluentd[1519]: c:0011 p:0042 s:0053 e:000052 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
Mär 23 09:14:12 marvin fluentd[1519]: c:0010 p:0018 s:0047 e:000044 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
Mär 23 09:14:12 marvin fluentd[1519]: c:0009 p:0013 s:0040 e:000039 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144
Mär 23 09:14:12 marvin fluentd[1519]: c:0008 p:0032 s:0034 e:000033 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121 [FINISH]
Mär 23 09:14:12 marvin fluentd[1519]: c:0007 p:---- s:0030 e:000029 IFUNC 
Mär 23 09:14:12 marvin fluentd[1519]: c:0006 p:0012 s:0027 e:000026 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/fluentd-1.16.4/lib/fluent/plugin_helper/timer.rb:80 [FINISH]
Mär 23 09:14:12 marvin fluentd[1519]: c:0005 p:---- s:0022 e:000021 CFUNC  :run_once
Mär 23 09:14:12 marvin fluentd[1519]: c:0004 p:0034 s:0017 e:000016 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88
Mär 23 09:14:12 marvin fluentd[1519]: c:0003 p:0026 s:0012 e:000011 BLOCK  /usr/local/lib/ruby/gems/3.2.0/gems/fluentd-1.16.4/lib/fluent/plugin_helper/event_loop.rb:93
Mär 23 09:14:12 marvin fluentd[1519]: c:0002 p:0050 s:0008 e:000007 BLOCK  /usr/local/lib/ruby/gems/3.2.0/gems/fluentd-1.16.4/lib/fluent/plugin_helper/thread.rb:78 [FINISH]
Mär 23 09:14:12 marvin fluentd[1519]: c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]
Mär 23 09:14:12 marvin fluentd[1519]: 
Mär 23 09:14:12 marvin fluentd[1519]: -- Ruby level backtrace information

Additional context

No response

kenhys commented 4 months ago

Customize Dockerfile like this:

FROM fluent/fluentd:v1.16.4-debian-amd64-1.0

# Use root account to use apt
USER root

# below RUN includes plugin as examples elasticsearch is not required
# you may customize including plugins as you wish
RUN buildDeps="sudo make gcc g++ libc-dev" \
 && apt-get update \
 && apt-get install -y --no-install-recommends $buildDeps \
 && sudo gem install fluent-plugin-systemd \
 fluent-plugin-record-modifier \
 fluent-plugin-grafana-loki \
 fluent-plugin-rewrite-tag-filter \
 && sudo gem sources --clear-all \
 && rm -rf /var/lib/apt/lists/* \
 && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

COPY fluent.conf /fluentd/etc/

Then build image, use custom build may cause SEGV.

 docker run --rm -it -v /var/log/journal:/var/log/journal 378
2024-03-25 02:29:46 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:29:46 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-03-25 02:29:46 +0000 [info]: gem 'fluentd' version '1.16.4'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.2.0'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-03-25 02:29:46 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type systemd
    tag "systemd"
    path "/var/log/journal"
    matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
    <storage>
      @type "local"
      persistent false
      path "systemd.pos"
    </storage>
    <entry>
      fields_strip_underscores true
      fields_lowercase true
    </entry>
  </source>
</ROOT>
2024-03-25 02:29:46 +0000 [info]: starting fluentd-1.16.4 pid=7 ruby="3.2.3"
2024-03-25 02:29:46 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-03-25 02:29:47 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:29:47 +0000 [info]: adding source type="systemd"
2024-03-25 02:29:47 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0
2024-03-25 02:29:47 +0000 [info]: #0 fluentd worker is now running worker=0
2024-03-25 02:29:48 +0000 [warn]: #0 no patterns matched tag="systemd"
free(): invalid pointer
2024-03-25 02:29:48 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
kenhys commented 4 months ago

Workaround: disable jemalloc in customized container image.

Set empty LD_PRELOAD="".

docker run --rm -it -e LD_PRELOAD="" -v /var/log/journal:/var/log/journal 378
2024-03-25 02:38:00 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:38:00 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-03-25 02:38:00 +0000 [info]: gem 'fluentd' version '1.16.4'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.2.0'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-03-25 02:38:01 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type systemd
    tag "systemd"
    path "/var/log/journal"
    matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
    <storage>
      @type "local"
      persistent false
      path "systemd.pos"
    </storage>
    <entry>
      fields_strip_underscores true
      fields_lowercase true
    </entry>
  </source>
</ROOT>
2024-03-25 02:38:01 +0000 [info]: starting fluentd-1.16.4 pid=7 ruby="3.2.3"
2024-03-25 02:38:01 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-03-25 02:38:01 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:38:01 +0000 [info]: adding source type="systemd"
2024-03-25 02:38:01 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0
2024-03-25 02:38:01 +0000 [info]: #0 fluentd worker is now running worker=0
2024-03-25 02:38:02 +0000 [warn]: #0 no patterns matched tag="systemd"
2024-03-25 02:38:03 +0000 [warn]: #0 no patterns matched tag="systemd"
daipom commented 4 months ago

Ref: https://github.com/fluent/fluent-package-builder/issues/369

kenhys commented 4 months ago

MEMO:

It seems that it was crashed here:

https://github.com/ledbettj/systemd-journal/blob/f3365c1147baeed2032b9c0ae223905d57216ce1/lib/systemd/journal.rb#L323-L327

    def self.read_and_free_outstr(ptr)
      str = ptr.read_string
      LibC.free(ptr)
      str
    end

LibC.free is called via read_and_free_outstr in Journal.cursor.

https://github.com/ledbettj/systemd-journal/blob/f3365c1147baeed2032b9c0ae223905d57216ce1/lib/systemd/journal/navigable.rb#L13

      def cursor
        out_ptr = FFI::MemoryPointer.new(:pointer, 1)
        if (rc = Native.sd_journal_get_cursor(@ptr, out_ptr)) < 0
          raise JournalError, rc
        end

        Journal.read_and_free_outstr(out_ptr.read_pointer)
      end

It was assumed that out_ptr is allocated and should be freed. With jemalloc, this mechanism may not work as expected.

HolgerHees commented 4 months ago

would it make sense to open a bug report on ledbettj systemd-journal project?

LHCGreg commented 2 weeks ago

I got this issue as well when updating from 1.16.3 to 1.17.0. I'm rolling back to 1.16.3 instead of disabling jemalloc because it sounds like a memory bug that's probably still there, it's just that it crashes under jemalloc and not the stock malloc. It would be great if someone familiar with the code could open an issue in systemd-journal if that's where the problem is.

kenhys commented 1 day ago

I've tried it with more recent version of jemalloc to investigate this SEGV.

This problem is still reproduced.

docker run --rm -v /var/log/journal:/var/log/journal fluent-systemd
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
2024-07-22 02:26:06 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-07-22 02:26:06 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-07-22 02:26:06 +0000 [info]: gem 'fluentd' version '1.17.0'
2024-07-22 02:26:06 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-07-22 02:26:06 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type systemd
    tag "systemd"
    path "/var/log/journal"
    matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
    <storage>
      @type "local"
      persistent false
      path "systemd.pos"
    </storage>
    <entry>
      fields_strip_underscores true
      fields_lowercase true
    </entry>
  </source>
</ROOT>
2024-07-22 02:26:06 +0000 [info]: starting fluentd-1.17.0 pid=2 ruby="3.2.4"
2024-07-22 02:26:06 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-07-22 02:26:07 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-07-22 02:26:07 +0000 [info]: adding source type="systemd"
2024-07-22 02:26:07 +0000 [info]: #0 starting fluentd worker pid=11 ppid=2 worker=0
2024-07-22 02:26:07 +0000 [info]: #0 fluentd worker is now running worker=0
2024-07-22 02:26:08 +0000 [warn]: #0 no patterns matched tag="systemd"
free(): invalid pointer
2024-07-22 02:26:08 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT