Closed HolgerHees closed 2 months ago
Customize Dockerfile like this:
FROM fluent/fluentd:v1.16.4-debian-amd64-1.0
# Use root account to use apt
USER root
# below RUN includes plugin as examples elasticsearch is not required
# you may customize including plugins as you wish
RUN buildDeps="sudo make gcc g++ libc-dev" \
&& apt-get update \
&& apt-get install -y --no-install-recommends $buildDeps \
&& sudo gem install fluent-plugin-systemd \
fluent-plugin-record-modifier \
fluent-plugin-grafana-loki \
fluent-plugin-rewrite-tag-filter \
&& sudo gem sources --clear-all \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem
COPY fluent.conf /fluentd/etc/
Then build image, use custom build may cause SEGV.
docker run --rm -it -v /var/log/journal:/var/log/journal 378
2024-03-25 02:29:46 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:29:46 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-03-25 02:29:46 +0000 [info]: gem 'fluentd' version '1.16.4'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.2.0'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2024-03-25 02:29:46 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-03-25 02:29:46 +0000 [info]: using configuration file: <ROOT>
<source>
@type systemd
tag "systemd"
path "/var/log/journal"
matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
<storage>
@type "local"
persistent false
path "systemd.pos"
</storage>
<entry>
fields_strip_underscores true
fields_lowercase true
</entry>
</source>
</ROOT>
2024-03-25 02:29:46 +0000 [info]: starting fluentd-1.16.4 pid=7 ruby="3.2.3"
2024-03-25 02:29:46 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-03-25 02:29:47 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:29:47 +0000 [info]: adding source type="systemd"
2024-03-25 02:29:47 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0
2024-03-25 02:29:47 +0000 [info]: #0 fluentd worker is now running worker=0
2024-03-25 02:29:48 +0000 [warn]: #0 no patterns matched tag="systemd"
free(): invalid pointer
2024-03-25 02:29:48 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
Workaround: disable jemalloc in customized container image.
Set empty LD_PRELOAD="".
docker run --rm -it -e LD_PRELOAD="" -v /var/log/journal:/var/log/journal 378
2024-03-25 02:38:00 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:38:00 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-03-25 02:38:00 +0000 [info]: gem 'fluentd' version '1.16.4'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-grafana-loki' version '1.2.20'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.2.0'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2024-03-25 02:38:00 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-03-25 02:38:01 +0000 [info]: using configuration file: <ROOT>
<source>
@type systemd
tag "systemd"
path "/var/log/journal"
matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
<storage>
@type "local"
persistent false
path "systemd.pos"
</storage>
<entry>
fields_strip_underscores true
fields_lowercase true
</entry>
</source>
</ROOT>
2024-03-25 02:38:01 +0000 [info]: starting fluentd-1.16.4 pid=7 ruby="3.2.3"
2024-03-25 02:38:01 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-03-25 02:38:01 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-03-25 02:38:01 +0000 [info]: adding source type="systemd"
2024-03-25 02:38:01 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0
2024-03-25 02:38:01 +0000 [info]: #0 fluentd worker is now running worker=0
2024-03-25 02:38:02 +0000 [warn]: #0 no patterns matched tag="systemd"
2024-03-25 02:38:03 +0000 [warn]: #0 no patterns matched tag="systemd"
MEMO:
It seems that it was crashed here:
def self.read_and_free_outstr(ptr)
str = ptr.read_string
LibC.free(ptr)
str
end
LibC.free is called via read_and_free_outstr in Journal.cursor
.
def cursor
out_ptr = FFI::MemoryPointer.new(:pointer, 1)
if (rc = Native.sd_journal_get_cursor(@ptr, out_ptr)) < 0
raise JournalError, rc
end
Journal.read_and_free_outstr(out_ptr.read_pointer)
end
It was assumed that out_ptr is allocated and should be freed. With jemalloc, this mechanism may not work as expected.
would it make sense to open a bug report on ledbettj systemd-journal project?
I got this issue as well when updating from 1.16.3 to 1.17.0. I'm rolling back to 1.16.3 instead of disabling jemalloc because it sounds like a memory bug that's probably still there, it's just that it crashes under jemalloc and not the stock malloc. It would be great if someone familiar with the code could open an issue in systemd-journal if that's where the problem is.
I've tried it with more recent version of jemalloc to investigate this SEGV.
This problem is still reproduced.
docker run --rm -v /var/log/journal:/var/log/journal fluent-systemd
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
2024-07-22 02:26:06 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-07-22 02:26:06 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-07-22 02:26:06 +0000 [info]: gem 'fluentd' version '1.17.0'
2024-07-22 02:26:06 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2024-07-22 02:26:06 +0000 [info]: using configuration file: <ROOT>
<source>
@type systemd
tag "systemd"
path "/var/log/journal"
matches [{"PRIORITY":[0,1,2,3,4,5,6]}]
<storage>
@type "local"
persistent false
path "systemd.pos"
</storage>
<entry>
fields_strip_underscores true
fields_lowercase true
</entry>
</source>
</ROOT>
2024-07-22 02:26:06 +0000 [info]: starting fluentd-1.17.0 pid=2 ruby="3.2.4"
2024-07-22 02:26:06 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-07-22 02:26:07 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-07-22 02:26:07 +0000 [info]: adding source type="systemd"
2024-07-22 02:26:07 +0000 [info]: #0 starting fluentd worker pid=11 ppid=2 worker=0
2024-07-22 02:26:07 +0000 [info]: #0 fluentd worker is now running worker=0
2024-07-22 02:26:08 +0000 [warn]: #0 no patterns matched tag="systemd"
free(): invalid pointer
2024-07-22 02:26:08 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
Probably I got the reason.
Mär 23 09:14:12 marvin fluentd[1519]: c:0013 p:---- s:0064 e:000063 CFUNC :free
Mär 23 09:14:12 marvin fluentd[1519]: c:0012 p:0012 s:0059 e:000058 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
Mär 23 09:14:12 marvin fluentd[1519]: c:0011 p:0042 s:0053 e:000052 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
Mär 23 09:14:12 marvin fluentd[1519]: c:0010 p:0018 s:0047 e:000044 METHOD /usr/local/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
systemd-jounal gem calls libc's free()
for FFI:Pointer
:
https://github.com/ledbettj/systemd-journal/blob/f3365c1147baeed2032b9c0ae223905d57216ce1/lib/systemd/journal.rb#L320-L327
# some sd_journal_* functions return strings that we're expected to free
# ourselves. This function copies the string from a char* to a ruby string,
# frees the char*, and returns the ruby string.
def self.read_and_free_outstr(ptr)
str = ptr.read_string
LibC.free(ptr)
str
end
When jemalloc is used, malloc()
and free()
families are replaced with jemalloc's one, so calling libc's one is inappropriate.
There was a pull request that fixes this issue: https://github.com/ledbettj/systemd-journal/pull/62
It looks good and the gem author also seems positive with this patch. But it's closed without merging by the patch author without any reason. Probably we should revive it.
It seems that it will not crash anymore.
I've created a PR for upsteam.
checking https://github.com/ledbettj/systemd-journal/pull/97 alternative implementation. but, it can't load yet.
irb(main):001:0> require "systemd/journal/shim"
<internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:86:in `require': cannot load such file -- systemd/journal/shim (LoadError)
.so is installed under:
/usr/local/bundle/gems/systemd-journal-1.4.2.1/lib/shim/shim.so
/usr/local/bundle/extensions/x86_64-linux/3.2.0/systemd-journal-1.4.2.1/shim/shim.so
Instead, shim/shim succeeds.
require "shim/shim"
=> true
It should be:
diff --git a/ext/shim/extconf.rb b/ext/shim/extconf.rb
index 94abd76..a53b749 100644
--- a/ext/shim/extconf.rb
+++ b/ext/shim/extconf.rb
@@ -7,4 +7,4 @@ require "mkmf"
# selectively, or entirely remove this flag.
append_cflags("-fvisibility=hidden")
-create_makefile("shim/shim")
+create_makefile("systemd/journal/shim")
Observing changes...
diff --git a/v1.17/debian/Dockerfile b/v1.17/debian/Dockerfile
index 4a245d1..43849c6 100644
--- a/v1.17/debian/Dockerfile
+++ b/v1.17/debian/Dockerfile
@@ -6,6 +6,8 @@ LABEL maintainer "Fluentd developers <fluentd@googlegroups.com>"
LABEL Description="Fluentd docker image" Vendor="Fluent Organization" Version="1.17.1"
ENV TINI_VERSION=0.18.0
+COPY systemd-journal-1.4.2.1.gem /fluentd/
+
# Do not split this into multiple RUN!
# Docker creates a layer for every RUN-Statement
# therefore an 'apt-get purge' has no effect
@@ -24,6 +26,10 @@ RUN apt-get update \
&& gem install async -v 1.32.1 \
&& gem install async-http -v 0.64.2 \
&& gem install fluentd -v 1.17.1 \
+ && gem install ffi \
+ && gem install --local /fluentd/systemd-journal-1.4.2.1.gem \
+ && gem install fluent-plugin-systemd \
+ && gem install fluent-plugin-watch-objectspace \
&& dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')" \
&& wget -O /usr/local/bin/tini "https://github.com/krallin/tini/releases/download/v$TINI_VERSION/tini-$dpkgArch" \
&& wget -O /usr/local/bin/tini.asc "https://github.com/krallin/tini/releases/download/v$TINI_VERSION/tini-$dpkgArch.asc" \
@@ -53,7 +59,6 @@ RUN groupadd -r fluent && useradd -r -g fluent fluent \
&& mkdir -p /fluentd/etc /fluentd/plugins \
&& chown -R fluent /fluentd && chgrp -R fluent /fluentd
-
COPY fluent.conf /fluentd/etc/
COPY entrypoint.sh /bin/
Threshold is a bit strict (x1.1), so observed that error notification was fired. But after a while, it seems that garbage is collected.
Running docker image with: docker run --rm -v /var/log/journal:/var/log/journal -v ./fluent.conf:/fluentd/etc/fluent.conf test-systemd
Run with docker run --rm -e LD_PRELOAD="" -v /var/log/journal:/var/log/journal -v ./fluent.conf:/fluentd/etc/fluent.conf test-systemd
Observed memory consumption with/without jemalloc (with fix)
NOTE: processed systemd amount of events vary, so strictly speaking, it is not fair to compare with it. I want to check just "leaks".
I've checked with integrated fixed version (systemd-journal 2.0.0) into test container with/without jemalloc again.
It seems that same tendency was shown from the attached logs.
fix-with-jemalloc-2.0.0.log.gz fix-without-jemalloc-2.0.0.log.gz
So, it was resolved in systemd-journal 2.0.0.
I've sent a feedback to adopt systemd-journal 2.0.0 https://github.com/fluent-plugins-nursery/fluent-plugin-systemd/pull/111
This issue was fixed via fluent-plugin-systemd 1.1.0. (which uses systemd-journal 2.0.0)
Please use fluent-plugin-systemd 1.1.0.
https://github.com/fluent-plugins-nursery/fluent-plugin-systemd/releases/tag/v1.1.0
Describe the bug
After updating the official docker container from fluent/fluentd:v1.16.3 to fluent/fluentd:v1.16.4 I got a segmentation fault during startup which end in a endless starting loop.
Additionally I have the following gem modules installed
To Reproduce
just update and restart
Expected behavior
should not crash
Your Environment
Your Configuration
Your Error Log
Additional context
No response