Open rmsaad opened 6 months ago
FYI @edsiper @leonardo-albertovich @nokute78 this likely affects any threaded input plugin on this platform, not just out_stackdriver
. The segfault occurs in the generic output thread loop: https://github.com/fluent/fluent-bit/blob/07475e71ea4b9e8cdcd34154de5f89540b916171/src/flb_output_thread.c#L329
Haven't actually run it but the only way a segfault makes sense in this stacktrace is if &dns_ctx
is a bad address.
(gdb) break /src/fluent-bit/src/flb_output_thread.c:329
Breakpoint 1 at 0xe9718: file /src/fluent-bit/src/flb_output_thread.c, line 329.
(gdb) define print_sp
Type commands for definition of "print_sp".
End with a line saying just "end".
>x/40x $sp
>print &dns_ctx
>step
>print dns_ctx
>continue
>end
(gdb) run -c /etc/fluent/fluent-bit.conf
I created a command in gdb to print out stack memory, &dns_ctx, then step into flb_net_dns_lookup_context_cleanup() and print out dns_ctx. The stack memory looks weird right before the seg fault.
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$47 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$48 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$49 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$50 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$51 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$52 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x7552f000
0x749f9940: 0x760c1560 0x761b6000 0x760f7bec 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x760c1560 0x00000000
0x749f9970: 0x00000000 0x00006100 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x754e7074 0x754e7074 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$53 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$54 = (struct flb_net_dns *) 0x17cbc827
Thread 4 "flb-out-stackdr" received signal SIGSEGV, Segmentation fault.
0x004eefd8 in flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
I believe I'm experiencing this same segfault on ARM with all 3.x version with both the http
and file
output plugins, but interestingly not with the stdout
output plugin. I also agree it's not specific to stackdriver. I also agree that it seems to be something happening after the output plugins are processed, because I can add two file output plugins, and both files will be written, then the process crashes with the SIGSEGV
.
I'm not as comfortable building debug versions of fluent-bit to get stacktraces here, but I can say that the issue exists on all 3.x versions and not on 2.2.3.
Bug Report
Describe the bug
The stackdriver output plugin has been broken for arm32v7 release builds (ie. docker images) since v3.0.0.
I have done some digging and this does not seem to occur because of any recently introduced bugs. Instead it seems that previous to this commit: https://github.com/fluent/fluent-bit/commit/71746b35718e856a5f8615f95f35d450a142e8cd setting FLB_RELEASE=On wouldn't build a release binary unless FLB_DEBUG was also explicitly turned off, so the docker images always included a debug build of fluent-bit until v3.0.0.
To Reproduce
In the core dump output below the stack is corrupted and causes dns_ctx to get the illegal address: 0x17cbb6dd.
Expected behavior
The stackdriver output plugin should work on arm32v7 release build or at least docker images work. I tested and this isn't a problem for arm64 or x86.
Your Environment
[INPUT] Name cpu Tag gateway_cpu Interval_Sec 20
[FILTER] Name modify Match * Add labels.gateway_env development
[FILTER] Name nest Match Operation nest Wildcard labels. Nest_under logging.googleapis.com/labels Remove_prefix labels.
[OUTPUT] Name stackdriver Match * resource generic_node namespace ${DEV_CODE} node_id ${DEV_ID} location northamerica-northeast1-c severity_key level