Closed slewiskelly closed 3 years ago
Another similar, but slightly different stack trace:
[2020/09/25 02:40:42] [error] error parsing local_resource_id for type k8s_container
[2020/09/25 02:40:42] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x555f1aee7b28 in do_hash() at lib/onigmo/st.c:310
#1 0x555f1aee7b28 in onig_st_lookup() at lib/onigmo/st.c:1054
#2 0x555f1aec2628 in onig_st_lookup_strend() at lib/onigmo/regparse.c:426
#3 0x555f1aece0ea in name_find() at lib/onigmo/regparse.c:547
#4 0x555f1aece0ea in name_add() at lib/onigmo/regparse.c:781
#5 0x555f1aece0ea in parse_enclose() at lib/onigmo/regparse.c:5053
#6 0x555f1aece0ea in parse_exp() at lib/onigmo/regparse.c:6534
#7 0x555f1aecf806 in parse_branch() at lib/onigmo/regparse.c:6905
#8 0x555f1aecf8d3 in parse_subexp() at lib/onigmo/regparse.c:6938
#9 0x555f1aecfadc in parse_regexp() at lib/onigmo/regparse.c:6987
#10 0x555f1aecfadc in onig_parse_make_tree() at lib/onigmo/regparse.c:7032
#11 0x555f1aeda66e in onig_compile() at lib/onigmo/regcomp.c:5754
#12 0x555f1aedb232 in onig_new() at lib/onigmo/regcomp.c:5982
#13 0x555f1adfb5df in str_to_regex() at src/flb_regex.c:78
#14 0x555f1adfb65a in flb_regex_create() at src/flb_regex.c:108
#15 0x555f1ae5ea6f in is_tag_match_regex() at plugins/out_stackdriver/stackdriver.c:708
#16 0x555f1ae5ff5c in stackdriver_format() at plugins/out_stackdriver/stackdriver.c:1320
#17 0x555f1ae6137d in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1719
#18 0x555f1ade74eb in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#19 0x555f1b210286 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#20 0xffffffffffffffff in ???() at ???:0
I had no reason to think an upgrade to 1.5.7 would improve things, but the following are stack traces captured from some Pods post-upgrade:
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:21:06] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:06] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:21:08] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:08] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:21:09] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:09] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x5594d429aebc in atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1 0x5594d429aebc in rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2 0x5594d429aebc in rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3 0x5594d429aebc in ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4 0x5594d429aebc in je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5 0x5594d430dd48 in flb_free() at include/fluent-bit/flb_mem.h:122
#6 0x5594d430ee15 in flb_sds_destroy() at src/flb_sds.c:393
#7 0x5594d4332e1a in flb_kv_item_destroy() at src/flb_kv.c:83
#8 0x5594d4332e9c in flb_kv_release() at src/flb_kv.c:102
#9 0x5594d43cc22e in http_headers_destroy() at src/flb_http_client.c:929
#10 0x5594d43cc8a1 in flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5594d439882b in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5594d431e50a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5594d4752346 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:23:59] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:59] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:00] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:00] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Bad input parameters to function
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:01] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:01] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:03] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:03] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x5584d4029ebc in atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1 0x5584d4029ebc in rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2 0x5584d4029ebc in rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3 0x5584d4029ebc in ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4 0x5584d4029ebc in je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5 0x5584d409cd48 in flb_free() at include/fluent-bit/flb_mem.h:122
#6 0x5584d409de15 in flb_sds_destroy() at src/flb_sds.c:393
#7 0x5584d40c1e1a in flb_kv_item_destroy() at src/flb_kv.c:83
#8 0x5584d40c1e9c in flb_kv_release() at src/flb_kv.c:102
#9 0x5584d415b22e in http_headers_destroy() at src/flb_http_client.c:929
#10 0x5584d415b8a1 in flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5584d412782b in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5584d40ad50a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5584d44e1346 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:24:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:11] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/09/28 03:24:11] [error] [src/flb_http_client.c:1085 errno=25] Inappropriate ioctl for device
[2020/09/28 03:24:12] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:12] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:14] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:14] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x559721724ebc in atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1 0x559721724ebc in rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2 0x559721724ebc in rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3 0x559721724ebc in ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4 0x559721724ebc in je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5 0x559721797d48 in flb_free() at include/fluent-bit/flb_mem.h:122
#6 0x559721798e15 in flb_sds_destroy() at src/flb_sds.c:393
#7 0x5597217bce1a in flb_kv_item_destroy() at src/flb_kv.c:83
#8 0x5597217bce9c in flb_kv_release() at src/flb_kv.c:102
#9 0x55972185622e in http_headers_destroy() at src/flb_http_client.c:929
#10 0x5597218568a1 in flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x55972182282b in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5597217a850a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x559721bdc346 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:23:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:12] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:12] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x564e46805ebc in atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1 0x564e46805ebc in rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2 0x564e46805ebc in rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3 0x564e46805ebc in ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4 0x564e46805ebc in je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5 0x564e46878d48 in flb_free() at include/fluent-bit/flb_mem.h:122
#6 0x564e46879e15 in flb_sds_destroy() at src/flb_sds.c:393
#7 0x564e4689de1a in flb_kv_item_destroy() at src/flb_kv.c:83
#8 0x564e4689de9c in flb_kv_release() at src/flb_kv.c:102
#9 0x564e4693722e in http_headers_destroy() at src/flb_http_client.c:929
#10 0x564e469378a1 in flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x564e4690382b in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x564e4688950a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x564e46cbd346 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:23:06] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:06] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:07] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:07] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:23:07] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:23:09] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:09] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:13] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:13] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:15] [error] error parsing local_resource_id for type k8s_container
[engine] caught signal (SIGSEGV)
[2020/09/28 03:23:15] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
#0 0x564c99d9c2b6 in mk_list_size() at lib/monkey/include/monkey/mk_core/mk_list.h:117
#1 0x564c99d9d231 in flb_engine_dispatch() at src/flb_engine_dispatch.c:284
#2 0x564c99d9a903 in flb_engine_flush() at src/flb_engine.c:85
#3 0x564c99d9bd48 in flb_engine_handle_event() at src/flb_engine.c:292
#4 0x564c99d9bd48 in flb_engine_start() at src/flb_engine.c:559
#5 0x564c99d102f4 in flb_main() at src/fluent-bit.c:1035
#6 0x564c99d10342 in main() at src/fluent-bit.c:1048
#7 0x7f2aa1bd509a in ???() at ???:0
#8 0x564c99d0dfd9 in ???() at ???:0
#9 0xffffffffffffffff in ???() at ???:0
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:24:53] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:53] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:54] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:54] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:56] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:56] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:57] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:57] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:59] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/09/28 03:24:59] [error] [src/flb_http_client.c:1085 errno=25] Inappropriate ioctl for device
[2020/09/28 03:25:00] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:00] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:03] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:03] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:05] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:05] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:07] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:07] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:08] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:08] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x5592996db9d5 in cio_chunk_is_up() at lib/chunkio/src/cio_chunk.c:405
#1 0x559299450304 in flb_input_chunk_total_size() at src/flb_input_chunk.c:271
#2 0x55929945051c in flb_input_chunk_set_up_down() at src/flb_input_chunk.c:355
#3 0x5592994389ab in flb_task_retry_create() at src/flb_task.c:171
#4 0x559299435ff3 in flb_engine_manager() at src/flb_engine.c:200
#5 0x559299436d8f in flb_engine_handle_event() at src/flb_engine.c:300
#6 0x559299436d8f in flb_engine_start() at src/flb_engine.c:559
#7 0x5592993ab2f4 in flb_main() at src/fluent-bit.c:1035
#8 0x5592993ab342 in main() at src/fluent-bit.c:1048
#9 0x7fbb33de709a in ???() at ???:0
#10 0x5592993a8fd9 in ???() at ???:0
#11 0xffffffffffffffff in ???() at ???:0
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/28 03:24:42] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:42] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:42] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:43] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:43] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:44] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:44] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:45] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:45] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:46] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0 0x5608dd207ebc in atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1 0x5608dd207ebc in rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2 0x5608dd207ebc in rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3 0x5608dd207ebc in ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4 0x5608dd207ebc in je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5 0x5608dd27ad48 in flb_free() at include/fluent-bit/flb_mem.h:122
#6 0x5608dd27be15 in flb_sds_destroy() at src/flb_sds.c:393
#7 0x5608dd29fe1a in flb_kv_item_destroy() at src/flb_kv.c:83
#8 0x5608dd29fe9c in flb_kv_release() at src/flb_kv.c:102
#9 0x5608dd33922e in http_headers_destroy() at src/flb_http_client.c:929
#10 0x5608dd3398a1 in flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5608dd30582b in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5608dd28b50a in output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5608dd6bf346 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#14 0xffffffffffffffff in ???() at ???:0
I updated my Fluent Bit config, the diff is as follows:
) diff old new
< Tag k8s_container.<namespace_name>.<pod_name>.<container_name>
< Tag_Regex (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
< [FILTER]
< Name parser
< Match k8s_container.*
< Key_Name log
< Reserve_Data True
21a15
> Tag kube.*
24,27c18
< Match k8s_container.*
< Kube_Tag_Prefix k8s_container.
< Regex_Parser k8s-custom-tag
< Kube_URL https://kubernetes.default.svc.cluster.local:443
---
> Match kube.*
77c68
< Name k8s-custom-tag
---
> Name kube-custom
79c70
< Regex (?<namespace_name>[^_]+)\.(?<pod_name>[^_]+)\.(?<container_name>.+)
---
> Regex (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
The configuration in its entirety is now:
[SERVICE]
Parsers_File parsers.conf
Flush 1
HTTP_Server On
storage.metrics On
[INPUT]
Name tail
DB /var/run/flb/pos-files/flb_kube.db
Mem_Buf_Limit 5M
Refresh_Interval 1
Skip_Long_Lines On
Path /var/log/containers/*.log
Exclude_Path /var/log/containers/*_kube-system_*.log,/var/log/containers/*_istio-system_*.log,/var/log/containers/*_knative-serving_*.log,/var/log/containers/*_gke-system_*.log,/var/log/containers/*_config-management-system_*.log
Parser docker
Tag kube.*
[FILTER]
Name kubernetes
Match kube.*
Annotations Off
Keep_Log Off
Merge_Log On
K8S-Logging.Exclude On
[FILTER]
Name nest
Match *
Operation lift
Nested_under kubernetes
Add_prefix k8s.
[FILTER]
Name nest
Match *
Operation lift
Nested_under k8s.labels
Add_prefix k8s-pod/
[FILTER]
Name nest
Match *
Operation nest
Nest_under k8s.labels
Wildcard k8s-pod/*
[FILTER]
Name modify
Match *
Hard_rename k8s.labels labels
[FILTER]
Name modify
Match *
Remove_wildcard k8s.
[FILTER]
Name modify
Match *
Hard_rename log message
[OUTPUT]
Name stackdriver
Match *
k8s_cluster_name ${CLUSTER}
k8s_cluster_location ${ZONE}
labels_key labels
resource k8s_container
severity_key level
---
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name kube-custom
Format regex
Regex (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
I will continue to monitor, but I've noticed an immediate improvement in the health of the Pods.
FYI: re-opening since @JeffLuoo will take a look at this
Hi @slewiskelly, thanks for the detailed issue description. I have some questions regarding to this issue
I am wondering is the configuration of 1.5.7 same as the one for the 1.5.6? And, When you checked the logs, did you notice the log like
local_resource_id not found, tag xxxxx is assigned for local_resource_id
@JeffLuoo, thanks for taking a look.
I am wondering is the configuration of 1.5.7 same as the one for the 1.5.6? And,
The configuration was the same between versions, until I updated the configuration described in https://github.com/fluent/fluent-bit/issues/2580#issuecomment-699906025.
When you checked the logs, did you notice the log like
local_resource_id not found, tag xxxxx is assigned for local_resource_id
I can't find any historical logs with that specific message.
For the most part, the only logs observed before crashes occurred were:
[error] error parsing local_resource_id for type k8s_container
[error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
@slewiskelly I see. Thanks for the update! I just checked the code and found that the log message:
local_resource_id not found, tag xxxxx is assigned for local_resource_id
will only show up if the log level of fluent bit is set to "debug". And what this message means is that in your json message there is no field with the key:
logging.googleapis.com/local_resource_id
so it is going to use the tag value of the log to assign the value of local_resource_id. And local_resource_id is just the name of variable I used to assign the value of metadata in the final log for k8s_container resource type.
According to the error message:
[error] error parsing local_resource_id for type k8s_container
the error will be narrowed down to the function here: https://github.com/fluent/fluent-bit/blob/b4129df6eb8f88e1caeb6216f68134caea69c361/plugins/out_stackdriver/stackdriver.c#L348
I will add some information to the error message (to
https://github.com/fluent/fluent-bit/blob/b4129df6eb8f88e1caeb6216f68134caea69c361/plugins/out_stackdriver/stackdriver.c#L385.) to include the local_resource_id
that is passed in to this function. And this will help use better debug and to see whether the local_resource_id
passed in is valid of not. Also we might need @slewiskelly to reproduce the error again to see what is the value of local_resource_id
since I tested it locally but still didn't find the error. Thank you!
cc @erain: Hi Yu, I am wondering that have you seen this kind of error before? Since I don't have the access to deploy the Fluent Bit on gke environment. Thank you!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Just did an install on CentOS 8 and I'm encountering this issue, and seeing a similar segfault in stackdriver_format
. Turned on debug log level and the only reasonably relevant line I can see is:
[2021/08/10 21:41:35] [debug] [output:stackdriver:stackdriver.1] [logging.googleapis.com/monitored_resource] not found in the payload
Stack trace (relevants parts, eg. no threads in epoll_wait):
Stack trace of thread 1410520:
#0 0x00007fcc7c23b65d __lll_lock_wait (libpthread.so.0)
#1 0x00007fcc7c234a44 __pthread_mutex_lock (libpthread.so.0)
#2 0x00007fcc7ab350a3 dl_iterate_phdr (libc.so.6)
#3 0x00007fcc7add6175 _Unwind_Find_FDE (libgcc_s.so.1)
#4 0x00007fcc7add2713 uw_frame_state_for (libgcc_s.so.1)
#5 0x00007fcc7add38f0 uw_init_context_1 (libgcc_s.so.1)
#6 0x00007fcc7add472c _Unwind_Backtrace (libgcc_s.so.1)
#7 0x0000000000434517 backtrace_full (fluent-bit)
#8 0x00000000004320bc flb_signal_handler (fluent-bit)
#9 0x00007fcc7aa35400 __restore_rt (libc.so.6)
#10 0x00000000004b26da stackdriver_format (fluent-bit)
#11 0x00000000004b4a09 cb_stackdriver_flush (fluent-bit)
#12 0x0000000000448cb8 output_pre_cb_flush (fluent-bit)
#13 0x0000000000690e87 co_init (fluent-bit)
Fluent-bit is installed from Google Cloud's Ops Agent (I don't think this is Google-specific though):
# /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit -V
Fluent Bit v1.7.8
My bad, this was the result of using resource gce_instance
, where I should have been using resource generic_node
.
Did anyone find any fix for this?
We faced same error in our environments. (on Kubernetes, fluent-bit v1.8.15)
Bug Report
Describe the bug
When deploying Fluent Bit into our Kubernetes cluster, a portion of Pods crash with the following errors:
To Reproduce
It's difficult to provide reproducible steps given the nature of the environment.
Given some advice on how to better troubleshoot, I will be able to provide more specific information than what I have provided in the "additional context" section.
Expected behavior
Fluent Bit to not crash and/or display more information at the appropriate log level (error or above).
Your Environment
fluent/fluent-bit:1.5.6
Additional context
Fluent Bit is deployed in a multi-tenant environment with a variety of log formats (though mostly JSON formatted).
I've tested Fluent Bit on Kubernetes to Stackdriver quite extensively with JSON formatted log files, and not observed these issues. It is only when deploying to a heterogeneous environment do I observe the failures.
When the Pods are restarted, only a portion of them crash (for an indeterminate amount of time). However, it seems they do eventually recover.
I have collected some debug logs, but I can't make any correlation after a cursory look over them. I can share them, but I will first have to ensure there is no sensitive information included.