Open rmsaad opened 4 months ago
FYI @edsiper @leonardo-albertovich @nokute78 this likely affects any threaded input plugin on this platform, not just out_stackdriver
. The segfault occurs in the generic output thread loop: https://github.com/fluent/fluent-bit/blob/07475e71ea4b9e8cdcd34154de5f89540b916171/src/flb_output_thread.c#L329
Haven't actually run it but the only way a segfault makes sense in this stacktrace is if &dns_ctx
is a bad address.
(gdb) break /src/fluent-bit/src/flb_output_thread.c:329
Breakpoint 1 at 0xe9718: file /src/fluent-bit/src/flb_output_thread.c, line 329.
(gdb) define print_sp
Type commands for definition of "print_sp".
End with a line saying just "end".
>x/40x $sp
>print &dns_ctx
>step
>print dns_ctx
>continue
>end
(gdb) run -c /etc/fluent/fluent-bit.conf
I created a command in gdb to print out stack memory, &dns_ctx, then step into flb_net_dns_lookup_context_cleanup() and print out dns_ctx. The stack memory looks weird right before the seg fault.
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$47 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$48 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$49 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$50 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x00000008
0x749f9940: 0x00000000 0x00000008 0x00000000 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x00000000 0x00000000
0x749f9970: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x749f9994 0x749f9994 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$51 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x749f9994) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$52 = (struct flb_net_dns *) 0x749f9994
Thread 4 "flb-out-stackdr" hit Breakpoint 1, output_thread (data=0x760f7b80) at /src/fluent-bit/src/flb_output_thread.c:329
329 /src/fluent-bit/src/flb_output_thread.c: No such file or directory.
(gdb)
0x749f9920: 0x00890664 0x76170200 0x00000000 0x00000000
0x749f9930: 0x00000000 0x00000000 0x00000000 0x7552f000
0x749f9940: 0x760c1560 0x761b6000 0x760f7bec 0x755d0000
0x749f9950: 0x761701c0 0x00000000 0x00000000 0xdeadbeef
0x749f9960: 0x760f7bdc 0x00000000 0x760c1560 0x00000000
0x749f9970: 0x00000000 0x00006100 0x00000000 0x00000000
0x749f9980: 0x00000000 0x00000000 0x00000000 0x00000000
0x749f9990: 0x00000000 0x754e7074 0x754e7074 0x749f999c
0x749f99a0: 0x749f999c 0x00000023 0x00008000 0x00000001
0x749f99b0: 0x00000002 0x00000000 0x00000000 0x00000000
$53 = (struct flb_net_dns *) 0x749f9994
flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613
613 /src/fluent-bit/src/flb_network.c: No such file or directory.
$54 = (struct flb_net_dns *) 0x17cbc827
Thread 4 "flb-out-stackdr" received signal SIGSEGV, Segmentation fault.
0x004eefd8 in flb_net_dns_lookup_context_cleanup (dns_ctx=0x17cbc827) at /src/fluent-bit/src/flb_network.c:613
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
Bug Report
Describe the bug
The stackdriver output plugin has been broken for arm32v7 release builds (ie. docker images) since v3.0.0.
I have done some digging and this does not seem to occur because of any recently introduced bugs. Instead it seems that previous to this commit: https://github.com/fluent/fluent-bit/commit/71746b35718e856a5f8615f95f35d450a142e8cd setting FLB_RELEASE=On wouldn't build a release binary unless FLB_DEBUG was also explicitly turned off, so the docker images always included a debug build of fluent-bit until v3.0.0.
To Reproduce
In the core dump output below the stack is corrupted and causes dns_ctx to get the illegal address: 0x17cbb6dd.
Expected behavior
The stackdriver output plugin should work on arm32v7 release build or at least docker images work. I tested and this isn't a problem for arm64 or x86.
Your Environment
[INPUT] Name cpu Tag gateway_cpu Interval_Sec 20
[FILTER] Name modify Match * Add labels.gateway_env development
[FILTER] Name nest Match Operation nest Wildcard labels. Nest_under logging.googleapis.com/labels Remove_prefix labels.
[OUTPUT] Name stackdriver Match * resource generic_node namespace ${DEV_CODE} node_id ${DEV_ID} location northamerica-northeast1-c severity_key level