Closed taoeffect closed 7 years ago
Now it's doing:
[2017/09/18 00:47:31] [ warn] [msgpack2json] unknown msgpack type 876228640
[2017/09/18 00:47:31] [ warn] [msgpack2json] unknown msgpack type 876228640
[2017/09/18 00:47:31] [ warn] [msgpack2json] unknown msgpack type 876228640
[2017/09/18 00:47:31] [ warn] [msgpack2json] unknown msgpack type 876228640
[2017/09/18 00:47:31] [ warn] [msgpack2json] unknown msgpack type 876228640
It seems to be caused by the output of the mariadb
service, which has a Dockerfile of:
FROM localhost:5000/customalpine
RUN apk --update add mariadb mariadb-client && rm -f /var/cache/apk/*
VOLUME /var/lib/mysql
EXPOSE 3306
COPY configs/ /etc/mysql/
COPY scripts/docker-entrypoint.sh /usr/local/bin
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
Where docker-entrypoint.sh
is:
#!/bin/sh
MYSQL_DATADIR=${MYSQL_DATADIR:-"/var/lib/mysql"}
if [ -d "/run/mysqld" ]; then
echo "[i] MySQL directory already present, skipping creation"
else
echo "[i] mysqld not found, creating...."
mkdir -p /run/mysqld
chown -R mysql:mysql /run/mysqld
echo 'Initializing database'
mkdir -p "$MYSQL_DATADIR/mysql"
chown -R mysql:mysql $MYSQL_DATADIR
mysql_install_db --user=mysql --datadir="$MYSQL_DATADIR" --rpm
echo 'Database initialized'
tfile=`mktemp`
[ -f "$tfile" ] || echo "failed to mktemp!" && exit 1
cat << EOF > $tfile
DELETE FROM mysql.user WHERE user NOT IN ('mysql.sys', 'mysqlxsys', 'root') OR host NOT IN ('localhost');
SET PASSWORD FOR 'root'@'localhost'=PASSWORD('${MYSQL_ROOT_PASSWORD}');
GRANT ALL ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
DROP DATABASE IF EXISTS test;
FLUSH PRIVILEGES;
EOF
/usr/bin/mysqld --user=mysql --bootstrap --verbose=1 --datadir="$MYSQL_DATADIR" < $tfile
rm -f "$tfile"
fi
exec /usr/bin/mysqld --user=mysql --console --datadir="$MYSQL_DATADIR"
And configs
contains configs/my.cnf
:
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
symbolic-links = 0
# Custom config should go here
!includedir /etc/mysql/conf.d/
And configs/conf.d/docker.cnf
:
[mysqld]
skip-host-cache
skip-name-resolve
I find it so strange that it wasn't doing this yesterday but now it's doing it every time and I can't get it to stop.
Should have mentioned, this is how the logs starts:
[2017/09/18 00:47:09] [ info] [engine] started
[2017/09/18 00:47:09] [ info] [in_fw] binding 0.0.0.0:24224
[2017/09/18 00:47:19] [ warn] [pack] invalid UTF-8 bytes, skipping
[2017/09/18 00:47:19] [ warn] unknown time format bf59ce92
[2017/09/18 00:47:19] [ warn] [msgpack2json] unknown msgpack type 876228640
[2017/09/18 00:47:19] [ warn] [msgpack2json] unknown msgpack type 876228640
Am using alpine linux, and all Docker containers should have these environment variables set:
ENV TERM=xterm LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
OK, I figured it out! 🎉 EDIT: Nope, didn't, see comment after next comment. 😢
It turns out that just having that ENV
line in the parent customalpine
Dockerfile was the problem!
Removing this from the custom alpine Dockerfile fixed the problem:
ENV TERM=xterm LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
Weirdly, having that in the Dockerfiles for the child log
and mariadb
services (or not having it!) didn't cause the problem to reappear.
So, I have no idea why this is. It's some very weirdly specific Docker+Alpine+Fluent-bit bug.
It's some very weirdly specific Docker+Alpine+Fluent-bit bug.
However, it still makes me nervous that this is at all possible. The problem could one day re-appear and result in a flooded log file. That problem shouldn't happen regardless.
The way syslog
and others handle this is they output a single line to the file along the lines of:
... previous log message repeated 2000 times ...
Looks like I celebrated too soon.
Sometimes it works. Sometimes it doesn't. I've no idea what's going on or why.
[2017/09/18 05:08:16] [ info] [engine] started
[2017/09/18 05:08:16] [ info] [in_fw] binding 0.0.0.0:24224
[2017/09/18 05:08:31] [ warn] [pack] invalid UTF-8 bytes, skipping
[2017/09/18 05:08:31] [ warn] [pack] invalid UTF-8 bytes, skipping
[2017/09/18 05:08:31] [ warn] unknown time format bf59ce92
[2017/09/18 05:08:31] [ warn] [msgpack2json] unknown msgpack type 1714435937
[2017/09/18 05:08:31] [ warn] [msgpack2json] unknown msgpack type 1714435937
Here's the very end of the output from the mariadb log:
forums_mariadb_1-sha256:84e8a: [1505711309.000000, {"container_name":"/forums_mariadb_1", "source":"stderr", "log":"2017-09-18 5:08:29 139803331611464 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.", "container_id":"caa70f6cbe6e7f5bf8bc48ac272ea371ffc037c3a6f6b02e5c73d0759d1a562a"}]
forums_mariadb_1-sha256:84e8a: [1505711309.000000, {"source":"stderr", "log":"", "container_id":"caa70f6cbe6e7f5bf8bc48ac272ea371ffc037c3a6f6b02e5c73d0759d1a562a", "container_name":"/forums_mariadb_1"}]
forums_mariadb_1-sha256:84e8a: [1505711309.000000, {"container_id":"caa70f6cbe6e7f5bf8bc48ac272ea371ffc037c3a6f6b02e5c73d0759d1a562a", "container_name":"/forums_mariadb_1", "source":"stderr", "log":"2017-09-18 5:08:29 139803331611464 [Note] InnoDB: Using mutexes to ref count buffer pool pages"}
The only anomaly that I see is that the second-to-last message contains an empty "log" value. Perhaps that's what's causing it..?
EDIT: I caught something else in another run:
forums_mariadb_1-sha256:b7f51: [1505712064.000000, {"container_id":"1c48013ede6b3941ad66f5e6b5871175905eda0a3cb33b4a95ffd926da5b642c", "container_name":"/forums_mariadb_1", "source":"stderr", "log":"2017-09-18 5:21:04 139901711743816 [Note] InnoDB: Completed initialization of buffer pool"}]
forums_mariadb_1-sha256:b7f51: [1505712064.000000, {"container_id":"1c48013ede6b3941ad66f5e6b5871175905eda0a3cb33b4a95ffd926da5b642c", "container_name":"/forums_mariadb_1", "source":"stderr", "log":"2017-09-18 5:21:04 1\u00a4\u0000s"}]
forums_mariadb_1-sha256:b7f51: [1505712064.000000, {"source":"stderr", "log":"2017-09-18 5:21:04 139901711743816 [Note] InnoDB: 128 rollback segment(s) are active.", "container_id":"1c48013ede6b3941ad66f5e6b5871175905eda0a3cb33b4", 6051919806976817155:0}
There are two anomalies in the logs:
\u00a4\u0000s"
in the 2nd to last entry6051919806976817155:0
, which is invalid JSON (caused jq
to fail to parse that part of the file), as 6051919806976817155
must be a string since it's a key... but also it doesn't belong there. No idea what it represents.@taoeffect please provide the output of:
fluent-bit -c yourconf.conf --sosreport
I'm having the same problem. I can reproduce it using that config:
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File /etc/fluent-bit/parsers.conf
Log_File /var/log/system-log-daemon
[INPUT]
Name kmsg
Tag kernel
[OUTPUT]
Name File
Match *
Path /var/log/messages
Somehow there's an issue in the file output. Everything works if I just use stdout
.
So if I replace the output with this one, it works:
[OUTPUT]
Name stdout
Match *
.... edit: weird, on the same system, same fluent-bit process, this one is working without any issues:
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File /etc/fluent-bit/parsers.conf
Log_File /var/log/docker-log-daemon
[INPUT]
Name forward
Listen 127.0.0.1
Port 24224
Chunk_Size 32
Buffer_Size 64
Parser docker
[FILTER]
Name kubernetes
Match *
Merge_JSON_Log On
Dummy_Meta On
[FILTER]
Name record_modifier
Match *
Remove_key kubernetes
[OUTPUT]
Name File
Match *
Path /var/log/docker_messages
@taoeffect please provide the output of:
@edsiper Under what circumstances do I run that command? And do I run it while fluent-bit is already running? Remember, the error is intermittent and doesn't always happen.
In my case the error occurs every time. Here's the whole output:
If it helps, here's the whole server setup I'm using:
https://github.com/marcbachmann/image-alpine
I could also give access to an instance.
thanks for the information.
Just a disclaimer: Alpine Linux is not supported, use our official Debian images instead
Just a disclaimer: Alpine Linux is not supported, use our official Debian images instead
Is that a "WONTFIX"?
Would be a shame, since it excludes a large portion of potential users. Docker makes little sense with Debian, and a lot of sense with Alpine. :-\
Is that a "WONTFIX"?
Regarding the issue, I don't think it's alpine specific. With the csv file format output I sometimes get a segfault right away but not that huge amount of errors like with the default formatter. ltsv is a bit more stable, there are just less errors.
About Alpine + Fluent Bit: we had many issues, specifically with:
We invested a lot of time to deal with Alpine, we will reconsider to support it once issues mentioned above becomes fixed.
about the main issue in question, I will troubleshoot.
@taoeffect @marcbachmann
I am trying to reproduce the problem.. would you please provide specific Dockerfiles or full compose file to test ?
Dockerfile
FROM alpine:3.6
ENV FLB_MAJOR 0
ENV FLB_MINOR 12
ENV FLB_PATCH 2
ENV FLB_VERSION 0.12.2
RUN apk --no-cache add file build-base ca-certificates cmake && \
wget -O "/tmp/fluent-bit-$FLB_VERSION.tar.gz" "http://fluentbit.io/releases/$FLB_MAJOR.$FLB_MINOR/fluent-bit-$FLB_VERSION.tar.gz" && \
cd /tmp && \
tar zxfv "fluent-bit-$FLB_VERSION.tar.gz" && \
cd "fluent-bit-$FLB_VERSION/build/" && \
cmake -DFLB_DEBUG=On -DFLB_TRACE=On -DFLB_JEMALLOC=On -DFLB_BUFFERING=On -DCMAKE_INSTALL_PREFIX=/ ../ && \
make && make install
log.conf
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File /etc/fluent-bit/parsers.conf
[INPUT]
Name kmsg
Tag kernel
[OUTPUT]
Name File
Match *
Path /var/log/messages
docker build -t fluent-bit .
docker run -it --rm --privileged -v $PWD/log.conf:/log.conf fluent-bit fluent-bit -c /log.conf
--privileged
in the run command is used to get access to /proc/kmsg
thanks for the test case.
After digging on it for an hour I could not find the root of the cause, it's not just the message on the screen, there some weird behaviors of the default memory allocator (or maybe valgrind on Alpine?):
==14== Thread 1:
==14== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==14== at 0x4020D39: epoll_pwait (in /lib/ld-musl-x86_64.so.1)
==14== by 0x2EBB52: mk_event_wait (mk_event.c:154)
==14== by 0x13D13C: flb_engine_start (flb_engine.c:466)
==14== by 0x131283: main (fluent-bit.c:729)
==14== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==14==
==14== Invalid read of size 1
==14== at 0x40484A0: __fwritex (in /lib/ld-musl-x86_64.so.1)
==14== by 0x3: ???
==14== Address 0x4ea8b01 is 33 bytes inside a block of size 317 alloc'd
==14== at 0x4C929C5: malloc (vg_replace_malloc.c:299)
==14== by 0x133824: flb_malloc (flb_mem.h:57)
==14== by 0x135BEA: flb_input_flush (flb_input.c:890)
==14== by 0x13E66F: flb_engine_dispatch (flb_engine_dispatch.c:168)
==14== by 0x13BD4F: flb_engine_flush (flb_engine.c:83)
==14== by 0x13D1D0: flb_engine_handle_event (flb_engine.c:257)
==14== by 0x13D1D0: flb_engine_start (flb_engine.c:469)
==14== by 0x131283: main (fluent-bit.c:729)
==14==
==14== Invalid read of size 1
==14== at 0x4050213: ??? (in /lib/ld-musl-x86_64.so.1)
==14== by 0x40484C1: __fwritex (in /lib/ld-musl-x86_64.so.1)
==14== by 0x3: ???
==14== Address 0x4ea8afc is 28 bytes inside a block of size 317 alloc'd
==14== at 0x4C929C5: malloc (vg_replace_malloc.c:299)
==14== by 0x133824: flb_malloc (flb_mem.h:57)
==14== by 0x135BEA: flb_input_flush (flb_input.c:890)
==14== by 0x13E66F: flb_engine_dispatch (flb_engine_dispatch.c:168)
==14== by 0x13BD4F: flb_engine_flush (flb_engine.c:83)
==14== by 0x13D1D0: flb_engine_handle_event (flb_engine.c:257)
==14== by 0x13D1D0: flb_engine_start (flb_engine.c:469)
==14== by 0x131283: main (fluent-bit.c:729)
since Alpine is not supported I suggest you use our Fluent Bit image (debian based). I understand the current image side is big (140MB) but next version will be around 80MB.
Fluent Bit 0.12.3 has been released:
The new Docker image is about 79MB (used to be > 100MB):
Since Alpine is not supported I am closing this ticket.
Thanks @edsiper! Am looking forward to trying this out, and will let you know if I run into the same problem again. Appreciate the small image size.
I have no idea what's going on.
In the span of a few seconds fluent-bit generated a 16M log file consisting of:
Am using Docker version
17.06.2-ce, build cec0b72
, with fluent-bit versionv0.12.2
.My dedicated container for logs contains this:
The config file contains this:
The
/home/unpriv/logs/fluent-bit.log
is the file containing the 16M worth of spammed msgpack2json messages.The docker-compose.yml contains this:
It wasn't doing this before. No idea why it's happening now.