fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.8k stars 1.58k forks source link

SIGSEGV in msgpack_sbuffer_write() #3943

Closed jiscfoo closed 3 years ago

jiscfoo commented 3 years ago

Bug Report

Describe the bug

fluent-bit (1.8.3) crashes in msgpack_sbuffer_write() during a multi-line tail

To Reproduce

Parser:

[PARSER]
    Name   shib-sp-shibd
    Format regex
    Regex ^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?<priority>\S+) (?<component>\S+) \[(?<thread>\d+)\] \[(?<tag>[^\]]+)\]: (?<message>.*)
    Time_Key time
    Time_Format %F %T

Config:

[SERVICE]
    parsers_file        debug-parser.conf

[INPUT]
    Name                tail
    Path                /var/log/shibboleth/shibd.log
    Multiline           on
    Parser_Firstline    shib-sp-shibd
    Path_Key            source
    Tag                 shib-sp.*
    Read_from_Head true

[OUTPUT]
    Name stdout
    match shib-sp.**

Invoke as:

/fluent-bit/bin/fluent-bit -c /a/debug.conf

Expected behavior

Correct ingestion of log events and output to stdout

Screenshots

# /fluent-bit/bin/fluent-bit -c /a/debug.conf -vv
Fluent Bit v1.8.3
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/08/12 11:02:38] [ info] Configuration:
[2021/08/12 11:02:38] [ info]  flush time     | 5.000000 seconds
[2021/08/12 11:02:38] [ info]  grace          | 5 seconds
[2021/08/12 11:02:38] [ info]  daemon         | 0
[2021/08/12 11:02:38] [ info] ___________
[2021/08/12 11:02:38] [ info]  inputs:
[2021/08/12 11:02:38] [ info]      tail
[2021/08/12 11:02:38] [ info] ___________
[2021/08/12 11:02:38] [ info]  filters:
[2021/08/12 11:02:38] [ info] ___________
[2021/08/12 11:02:38] [ info]  outputs:
[2021/08/12 11:02:38] [ info]      stdout.0
[2021/08/12 11:02:38] [ info] ___________
[2021/08/12 11:02:38] [ info]  collectors:
[2021/08/12 11:02:38] [ info] [engine] started (pid=148)
[2021/08/12 11:02:38] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/08/12 11:02:38] [debug] [storage] [cio stream] new stream registered: tail.0
[2021/08/12 11:02:38] [ info] [storage] version=1.1.1, initializing...
[2021/08/12 11:02:38] [ info] [storage] in-memory
[2021/08/12 11:02:38] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/08/12 11:02:38] [ info] [cmetrics] version=0.1.6
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] inotify watch fd=23
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] scanning path /var/log/shibboleth/shibd.log
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] inode=226355814 with offset=0 appended as /var/log/shibboleth/shibd.log
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] scan_glob add(): /var/log/shibboleth/shibd.log, inode 226355814
[2021/08/12 11:02:38] [debug] [input:tail:tail.0] 1 new files found on path '/var/log/shibboleth/shibd.log'
[2021/08/12 11:02:38] [debug] [stdout:stdout.0] created event channels: read=25 write=26
[2021/08/12 11:02:38] [debug] [router] match rule tail.0:stdout.0
[2021/08/12 11:02:38] [ info] [sp] stream processor started
[2021/08/12 11:02:38] [engine] caught signal (SIGSEGV)
#0  0x7f03eb235a1d      in  ???() at 4/multiarch/memmove-vec-unaligned-erms.S:532
#1  0x557e960c830c      in  msgpack_sbuffer_write() at lib/msgpack-c/include/msgpack/sbuffer.h:88
#2  0x557e960c80de      in  msgpack_pack_str_body() at lib/msgpack-c/include/msgpack/pack_template.h:784
#3  0x557e960c8d0c      in  flb_tail_mult_append_raw() at plugins/in_tail/tail_multiline.c:238
#4  0x557e960c9005      in  flb_tail_mult_process_content() at plugins/in_tail/tail_multiline.c:348
#5  0x557e960cb55e      in  process_content() at plugins/in_tail/tail_file.c:427
#6  0x557e960cd764      in  flb_tail_file_chunk() at plugins/in_tail/tail_file.c:1182
#7  0x557e960c56c3      in  in_tail_collect_static() at plugins/in_tail/tail.c:140
#8  0x557e96071c98      in  flb_input_collector_fd() at src/flb_input.c:1075
#9  0x557e96082969      in  flb_engine_handle_event() at src/flb_engine.c:377
#10 0x557e96082969      in  flb_engine_start() at src/flb_engine.c:639
#11 0x557e9606750c      in  flb_lib_worker() at src/flb_lib.c:628
#12 0x7f03ebaf1fa2      in  start_thread() at 4T1/glibc-2.28/nptl/pthread_create.c:486
#13 0x7f03eb1d24ce      in  ???() at sysv/linux/x86_64/clone.S:95
#14 0xffffffffffffffff  in  ???() at ???:0
Aborted

 Your Environment

Additional context

Segfault stops log processing. Only way to proceed is to bin the pos database we'd been using to skip over the input it's choking on and restart the service.

Input file that causes the crash can be supplied to developers but not made public.

jiscfoo commented 3 years ago

Possibly related to #3877

jiscfoo commented 3 years ago

gdb provides a more detailed stack trace:

#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:532
#1  0x00005616c2b0a30d in msgpack_sbuffer_write (data=0x7fb16202e388, buf=0x7fb16206ce25 "",
    len=18446744073709551615) at /tmp/fluent-bit/lib/msgpack-c/include/msgpack/sbuffer.h:88
#2  0x00005616c2b0a0df in msgpack_pack_str_body (x=0x7fb16202e3a0, b=0x7fb16206ce25, l=18446744073709551615)
    at /tmp/fluent-bit/lib/msgpack-c/include/msgpack/pack_template.h:784
#3  0x00005616c2b0ad0d in flb_tail_mult_append_raw (buf=0x7fb16206ce25 "", size=-1, file=0x7fb16202e300,
    config=0x7fb1620411c0) at /tmp/fluent-bit/plugins/in_tail/tail_multiline.c:238
#4  0x00005616c2b0b006 in flb_tail_mult_process_content (now=1628761684, buf=0x7fb16206ce25 "",
    len=18446744073709551615, file=0x7fb16202e300, ctx=0x7fb1620411c0, processed_bytes=25893)
    at /tmp/fluent-bit/plugins/in_tail/tail_multiline.c:348
#5  0x00005616c2b0d55f in process_content (file=0x7fb16202e300, bytes=0x7fb162df4778)
    at /tmp/fluent-bit/plugins/in_tail/tail_file.c:427
#6  0x00005616c2b0f765 in flb_tail_file_chunk (file=0x7fb16202e300)
    at /tmp/fluent-bit/plugins/in_tail/tail_file.c:1182
#7  0x00005616c2b076c4 in in_tail_collect_static (ins=0x7fb163248f00, config=0x7fb163232840,
    in_context=0x7fb1620411c0) at /tmp/fluent-bit/plugins/in_tail/tail.c:140
#8  0x00005616c2ab3c99 in flb_input_collector_fd (fd=19, config=0x7fb163232840)
    at /tmp/fluent-bit/src/flb_input.c:1075
#9  0x00005616c2ac496a in flb_engine_handle_event (config=0x7fb163232840, mask=1, fd=19)
    at /tmp/fluent-bit/src/flb_engine.c:377
#10 flb_engine_start (config=0x7fb163232840) at /tmp/fluent-bit/src/flb_engine.c:639
#11 0x00005616c2aa950d in flb_lib_worker (data=0x7fb16320b340) at /tmp/fluent-bit/src/flb_lib.c:628
#12 0x00007fb164d0ffa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#13 0x00007fb1643f04cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Of note in there is the len=18446744073709551615 which ends up being passed into sbuffer.h:88:

memcpy(sbuf->data + sbuf->size, buf, len);
jiscfoo commented 3 years ago

Looking at frame 5 in the stack (process_content (file=0x7fa52b62e300, bytes=0x7fa52c3f4778) at /tmp/fluent-bit/plugins/in_tail/tail_file.c:427)

line_len is passed into flb_tail_mult_process_content() which is calculated at https://github.com/fluent/fluent-bit/blob/v1.8.3/plugins/in_tail/tail_file.c#L370

line_len = len - crlf;

Within gdb, at point of crash, len is 0 (zero) and crlf is 1, so 0 - 1 is a very big number which isn't being handled correctly.

If I enable skip_empty_lines then it does not seem to crash.

jiscfoo commented 3 years ago

Possibly more subtle than this… the multiline input it chokes on contains ^M characters. I've isolated down to (cat -v output, so ^Ms are inline):

2021-08-10 09:12:14 DEBUG XMLTooling.SOAPTransport.CURL [252] [default]: nothing free in pool, returning new connection handle
2021-08-10 09:12:14 DEBUG XMLTooling.SOAPTransport.CURL [252] [default]: sending SOAP message to http://mdq.ukfederation.org.uk/entities/https%3A%2F%2Fredacted%2Fidp%2Fshibboleth
2021-08-10 09:12:14 DEBUG XMLTooling.libcurl [252] [default]:   Trying 52.169.71.108:80...

2021-08-10 09:12:14 DEBUG XMLTooling.libcurl [252] [default]: Connected to mdq.ukfederation.org.uk (52.169.71.108) port 80 (#0)

2021-08-10 09:12:14 DEBUG XMLTooling.libcurl [252] [default]: GET /entities/https%3A%2F%2Fredacted%2Fidp%2Fshibboleth HTTP/1.1^M
Host: mdq.ukfederation.org.uk^M
User-Agent: shibboleth/3.2.3 OpenSAML/3.2.0 XMLTooling/3.2.0 XML-Security-C/2.0.2 Xerces-C/3.2.3 log4shib/2.0.0 libcurl/7.77.0 OpenSSL/1.0.2k-fips^M
Accept-Encoding: deflate, gzip^M
Content-Type: text/xml^M
Accept: application/samlmetadata+xml^M
Xerces-C: 3.2.3^M
XML-Security-C: 2.0.2^M
XMLTooling-C: 3.2.0^M
OpenSAML-C: 3.2.0^M
shibboleth: 3.2.3^M
^M

2021-08-10 09:12:14 DEBUG XMLTooling.libcurl [252] [default]: Mark bundle as not supporting multiuse

Raw file and base64 encoded file attached.

nokute78 commented 3 years ago

@jiscfoo Thank you for information!

The log ^M+blank line causes SIGSEGV. I think multi-line is not related this issue.

bbb^M

I sent a patch #3951.