fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.79k stars 1.57k forks source link

Segmentation fault with Stackdriver output plugin #2580

Closed slewiskelly closed 3 years ago

slewiskelly commented 4 years ago

Bug Report

Describe the bug

When deploying Fluent Bit into our Kubernetes cluster, a portion of Pods crash with the following errors:

[2020/09/24 07:31:52] [error] error parsing local_resource_id for type k8s_container
[2020/09/24 07:31:52] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x5594dc2e1ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x5594dc2e1ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x5594dc2e1ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x5594dc2e1ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x5594dc2e1ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x5594dc354d48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x5594dc355e15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x5594dc379df4      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x5594dc379e76      in  flb_kv_release() at src/flb_kv.c:102
#9  0x5594dc412e9f      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x5594dc413512      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5594dc3df78a      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5594dc3654eb      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5594dc78e286      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
[2020/09/24 07:30:35] [error] error parsing local_resource_id for type k8s_container
[2020/09/24 07:30:35] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x564d018f4ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x564d018f4ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x564d018f4ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x564d018f4ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x564d018f4ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x564d01967d48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x564d01968e15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x564d01976574      in  flb_slist_destroy() at src/flb_slist.c:327
#8  0x564d019ef68f      in  process_local_resource_id() at plugins/out_stackdriver/stackdriver.c:590
#9  0x564d019f0f94      in  stackdriver_format() at plugins/out_stackdriver/stackdriver.c:1324
#10 0x564d019f237d      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1719
#11 0x564d019784eb      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#12 0x564d01da1286      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117

To Reproduce

It's difficult to provide reproducible steps given the nature of the environment.

Given some advice on how to better troubleshoot, I will be able to provide more specific information than what I have provided in the "additional context" section.

Expected behavior

Fluent Bit to not crash and/or display more information at the appropriate log level (error or above).

Your Environment

[SERVICE]
    Parsers_File            parsers.conf
    Flush                   1
    HTTP_Server             On
    storage.metrics         On
[INPUT]
    Name                    tail
    DB                      /var/run/flb/pos-files/flb_kube.db
    Mem_Buf_Limit           5M
    Refresh_Interval        1
    Skip_Long_Lines         On
    Path                    /var/log/containers/*.log
    Exclude_Path            /var/log/containers/*_kube-system_*.log,/var/log/containers/*_istio-system_*.log,/var/log/containers/*_knative-serving_*.log,/var/log/containers/*_gke-system_*.log,/var/log/containers/*_config-management-system_*.log
    Tag                     k8s_container.<namespace_name>.<pod_name>.<container_name>
    Tag_Regex               (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
[FILTER]
    Name                    parser
    Match                   k8s_container.*
    Key_Name                log
    Reserve_Data            True
    Parser                  docker
[FILTER]
    Name                    kubernetes
    Match                   k8s_container.*
    Kube_Tag_Prefix         k8s_container.
    Regex_Parser            k8s-custom-tag
    Kube_URL                https://kubernetes.default.svc.cluster.local:443
    Annotations             Off
    Keep_Log                Off
    Merge_Log               On
    K8S-Logging.Exclude     On
[FILTER]
    Name                    nest
    Match                   *
    Operation               lift
    Nested_under            kubernetes
    Add_prefix              k8s.
[FILTER]
    Name                    nest
    Match                   *
    Operation               lift
    Nested_under            k8s.labels
    Add_prefix              k8s-pod/
[FILTER]
    Name                    nest
    Match                   *
    Operation               nest
    Nest_under              k8s.labels
    Wildcard                k8s-pod/*
[FILTER]
    Name                    modify
    Match                   *
    Hard_rename             k8s.labels labels
[FILTER]
    Name                    modify
    Match                   *
    Remove_wildcard         k8s.
[FILTER]
    Name                    modify
    Match                   *
    Hard_rename             log message
[OUTPUT]
    Name                    stackdriver
    Match                   *
    k8s_cluster_name        ${CLUSTER}
    k8s_cluster_location    ${ZONE}
    labels_key              labels
    resource                k8s_container
    severity_key            level
---
[PARSER]
    Name                    docker
    Format                  json
    Time_Key                time
    Time_Format             %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
    Name                    k8s-custom-tag
    Format                  regex
    Regex                   (?<namespace_name>[^_]+)\.(?<pod_name>[^_]+)\.(?<container_name>.+)

Additional context

Fluent Bit is deployed in a multi-tenant environment with a variety of log formats (though mostly JSON formatted).

I've tested Fluent Bit on Kubernetes to Stackdriver quite extensively with JSON formatted log files, and not observed these issues. It is only when deploying to a heterogeneous environment do I observe the failures.

When the Pods are restarted, only a portion of them crash (for an indeterminate amount of time). However, it seems they do eventually recover.

I have collected some debug logs, but I can't make any correlation after a cursory look over them. I can share them, but I will first have to ensure there is no sensitive information included.

slewiskelly commented 4 years ago

Another similar, but slightly different stack trace:

[2020/09/25 02:40:42] [error] error parsing local_resource_id for type k8s_container
[2020/09/25 02:40:42] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x555f1aee7b28      in  do_hash() at lib/onigmo/st.c:310
#1  0x555f1aee7b28      in  onig_st_lookup() at lib/onigmo/st.c:1054
#2  0x555f1aec2628      in  onig_st_lookup_strend() at lib/onigmo/regparse.c:426
#3  0x555f1aece0ea      in  name_find() at lib/onigmo/regparse.c:547
#4  0x555f1aece0ea      in  name_add() at lib/onigmo/regparse.c:781
#5  0x555f1aece0ea      in  parse_enclose() at lib/onigmo/regparse.c:5053
#6  0x555f1aece0ea      in  parse_exp() at lib/onigmo/regparse.c:6534
#7  0x555f1aecf806      in  parse_branch() at lib/onigmo/regparse.c:6905
#8  0x555f1aecf8d3      in  parse_subexp() at lib/onigmo/regparse.c:6938
#9  0x555f1aecfadc      in  parse_regexp() at lib/onigmo/regparse.c:6987
#10 0x555f1aecfadc      in  onig_parse_make_tree() at lib/onigmo/regparse.c:7032
#11 0x555f1aeda66e      in  onig_compile() at lib/onigmo/regcomp.c:5754
#12 0x555f1aedb232      in  onig_new() at lib/onigmo/regcomp.c:5982
#13 0x555f1adfb5df      in  str_to_regex() at src/flb_regex.c:78
#14 0x555f1adfb65a      in  flb_regex_create() at src/flb_regex.c:108
#15 0x555f1ae5ea6f      in  is_tag_match_regex() at plugins/out_stackdriver/stackdriver.c:708
#16 0x555f1ae5ff5c      in  stackdriver_format() at plugins/out_stackdriver/stackdriver.c:1320
#17 0x555f1ae6137d      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1719
#18 0x555f1ade74eb      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#19 0x555f1b210286      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#20 0xffffffffffffffff  in  ???() at ???:0
slewiskelly commented 4 years ago

I had no reason to think an upgrade to 1.5.7 would improve things, but the following are stack traces captured from some Pods post-upgrade:

Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:21:06] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:06] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:21:08] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:08] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:21:09] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:21:09] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x5594d429aebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x5594d429aebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x5594d429aebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x5594d429aebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x5594d429aebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x5594d430dd48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x5594d430ee15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x5594d4332e1a      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x5594d4332e9c      in  flb_kv_release() at src/flb_kv.c:102
#9  0x5594d43cc22e      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x5594d43cc8a1      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5594d439882b      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5594d431e50a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5594d4752346      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:23:59] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:59] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:00] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:00] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Bad input parameters to function
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:00] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:01] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:24:01] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:03] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:03] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x5584d4029ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x5584d4029ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x5584d4029ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x5584d4029ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x5584d4029ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x5584d409cd48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x5584d409de15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x5584d40c1e1a      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x5584d40c1e9c      in  flb_kv_release() at src/flb_kv.c:102
#9  0x5584d415b22e      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x5584d415b8a1      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5584d412782b      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5584d40ad50a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5584d44e1346      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:24:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:11] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/09/28 03:24:11] [error] [src/flb_http_client.c:1085 errno=25] Inappropriate ioctl for device
[2020/09/28 03:24:12] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:12] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:14] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:14] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x559721724ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x559721724ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x559721724ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x559721724ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x559721724ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x559721797d48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x559721798e15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x5597217bce1a      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x5597217bce9c      in  flb_kv_release() at src/flb_kv.c:102
#9  0x55972185622e      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x5597218568a1      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x55972182282b      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5597217a850a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x559721bdc346      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:23:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:12] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:12] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x564e46805ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x564e46805ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x564e46805ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x564e46805ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x564e46805ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x564e46878d48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x564e46879e15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x564e4689de1a      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x564e4689de9c      in  flb_kv_release() at src/flb_kv.c:102
#9  0x564e4693722e      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x564e469378a1      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x564e4690382b      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x564e4688950a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x564e46cbd346      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:23:06] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:06] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:07] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:07] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 ECP - The signature is not valid
[2020/09/28 03:23:07] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:23:07] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:23:09] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:09] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:10] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:10] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:13] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:23:13] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:23:15] [error] error parsing local_resource_id for type k8s_container
[engine] caught signal (SIGSEGV)
[2020/09/28 03:23:15] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
#0  0x564c99d9c2b6      in  mk_list_size() at lib/monkey/include/monkey/mk_core/mk_list.h:117
#1  0x564c99d9d231      in  flb_engine_dispatch() at src/flb_engine_dispatch.c:284
#2  0x564c99d9a903      in  flb_engine_flush() at src/flb_engine.c:85
#3  0x564c99d9bd48      in  flb_engine_handle_event() at src/flb_engine.c:292
#4  0x564c99d9bd48      in  flb_engine_start() at src/flb_engine.c:559
#5  0x564c99d102f4      in  flb_main() at src/fluent-bit.c:1035
#6  0x564c99d10342      in  main() at src/fluent-bit.c:1048
#7  0x7f2aa1bd509a      in  ???() at ???:0
#8  0x564c99d0dfd9      in  ???() at ???:0
#9  0xffffffffffffffff  in  ???() at ???:0
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:24:53] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:53] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:54] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:54] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:56] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:56] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:57] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:57] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:59] [error] [tls] SSL error: NET - Connection was reset by peer
[2020/09/28 03:24:59] [error] [src/flb_http_client.c:1085 errno=25] Inappropriate ioctl for device
[2020/09/28 03:25:00] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:00] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:03] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:03] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:05] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:05] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:07] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:07] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:25:08] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:25:08] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x5592996db9d5      in  cio_chunk_is_up() at lib/chunkio/src/cio_chunk.c:405
#1  0x559299450304      in  flb_input_chunk_total_size() at src/flb_input_chunk.c:271
#2  0x55929945051c      in  flb_input_chunk_set_up_down() at src/flb_input_chunk.c:355
#3  0x5592994389ab      in  flb_task_retry_create() at src/flb_task.c:171
#4  0x559299435ff3      in  flb_engine_manager() at src/flb_engine.c:200
#5  0x559299436d8f      in  flb_engine_handle_event() at src/flb_engine.c:300
#6  0x559299436d8f      in  flb_engine_start() at src/flb_engine.c:559
#7  0x5592993ab2f4      in  flb_main() at src/fluent-bit.c:1035
#8  0x5592993ab342      in  main() at src/fluent-bit.c:1048
#9  0x7fbb33de709a      in  ???() at ???:0
#10 0x5592993a8fd9      in  ???() at ???:0
#11 0xffffffffffffffff  in  ???() at ???:0
Fluent Bit v1.5.7
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/28 03:24:42] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:42] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:42] [error] [io_tls] flb_io_tls.c:356 ECP - Invalid private or public key
[2020/09/28 03:24:43] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:43] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:44] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:44] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:45] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:45] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] [io_tls] flb_io_tls.c:356 SSL - A fatal alert message was received from our peer
[2020/09/28 03:24:46] [error] [filter:kubernetes:kubernetes.1] upstream connection error
[2020/09/28 03:24:46] [error] error parsing local_resource_id for type k8s_container
[2020/09/28 03:24:46] [error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
[engine] caught signal (SIGSEGV)
#0  0x5608dd207ebc      in  atomic_load_p() at lib/jemalloc-5.2.1/include/jemalloc/internal/atomic.h:62
#1  0x5608dd207ebc      in  rtree_leaf_elm_bits_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:175
#2  0x5608dd207ebc      in  rtree_szind_slab_read() at lib/jemalloc-5.2.1/include/jemalloc/internal/rtree.h:500
#3  0x5608dd207ebc      in  ifree() at lib/jemalloc-5.2.1/src/jemalloc.c:2570
#4  0x5608dd207ebc      in  je_free_default() at lib/jemalloc-5.2.1/src/jemalloc.c:2790
#5  0x5608dd27ad48      in  flb_free() at include/fluent-bit/flb_mem.h:122
#6  0x5608dd27be15      in  flb_sds_destroy() at src/flb_sds.c:393
#7  0x5608dd29fe1a      in  flb_kv_item_destroy() at src/flb_kv.c:83
#8  0x5608dd29fe9c      in  flb_kv_release() at src/flb_kv.c:102
#9  0x5608dd33922e      in  http_headers_destroy() at src/flb_http_client.c:929
#10 0x5608dd3398a1      in  flb_http_client_destroy() at src/flb_http_client.c:1176
#11 0x5608dd30582b      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:1783
#12 0x5608dd28b50a      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:449
#13 0x5608dd6bf346      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#14 0xffffffffffffffff  in  ???() at ???:0
slewiskelly commented 4 years ago

I updated my Fluent Bit config, the diff is as follows:

) diff old new
<     Tag                     k8s_container.<namespace_name>.<pod_name>.<container_name>
<     Tag_Regex               (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
< [FILTER]
<     Name                    parser
<     Match                   k8s_container.*
<     Key_Name                log
<     Reserve_Data            True
21a15
>     Tag                     kube.*
24,27c18
<     Match                   k8s_container.*
<     Kube_Tag_Prefix         k8s_container.
<     Regex_Parser            k8s-custom-tag
<     Kube_URL                https://kubernetes.default.svc.cluster.local:443
---
>     Match                   kube.*
77c68
<     Name                    k8s-custom-tag
---
>     Name                    kube-custom
79c70
<     Regex                   (?<namespace_name>[^_]+)\.(?<pod_name>[^_]+)\.(?<container_name>.+)
---
>     Regex                   (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

The configuration in its entirety is now:

[SERVICE]
    Parsers_File            parsers.conf
    Flush                   1
    HTTP_Server             On
    storage.metrics         On
[INPUT]
    Name                    tail
    DB                      /var/run/flb/pos-files/flb_kube.db
    Mem_Buf_Limit           5M
    Refresh_Interval        1
    Skip_Long_Lines         On
    Path                    /var/log/containers/*.log
    Exclude_Path            /var/log/containers/*_kube-system_*.log,/var/log/containers/*_istio-system_*.log,/var/log/containers/*_knative-serving_*.log,/var/log/containers/*_gke-system_*.log,/var/log/containers/*_config-management-system_*.log
    Parser                  docker
    Tag                     kube.*
[FILTER]
    Name                    kubernetes
    Match                   kube.*
    Annotations             Off
    Keep_Log                Off
    Merge_Log               On
    K8S-Logging.Exclude     On
[FILTER]
    Name                    nest
    Match                   *
    Operation               lift
    Nested_under            kubernetes
    Add_prefix              k8s.
[FILTER]
    Name                    nest
    Match                   *
    Operation               lift
    Nested_under            k8s.labels
    Add_prefix              k8s-pod/
[FILTER]
    Name                    nest
    Match                   *
    Operation               nest
    Nest_under              k8s.labels
    Wildcard                k8s-pod/*
[FILTER]
    Name                    modify
    Match                   *
    Hard_rename             k8s.labels labels
[FILTER]
    Name                    modify
    Match                   *
    Remove_wildcard         k8s.
[FILTER]
    Name                    modify
    Match                   *
    Hard_rename             log message
[OUTPUT]
    Name                    stackdriver
    Match                   *
    k8s_cluster_name        ${CLUSTER}
    k8s_cluster_location    ${ZONE}
    labels_key              labels
    resource                k8s_container
    severity_key            level
---
[PARSER]
    Name                    docker
    Format                  json
    Time_Key                time
    Time_Format             %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
    Name                    kube-custom
    Format                  regex
    Regex                   (?<tag>[^.]+)?\.?(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

I will continue to monitor, but I've noticed an immediate improvement in the health of the Pods.

edsiper commented 4 years ago

FYI: re-opening since @JeffLuoo will take a look at this

JeffLuoo commented 4 years ago

Hi @slewiskelly, thanks for the detailed issue description. I have some questions regarding to this issue

I am wondering is the configuration of 1.5.7 same as the one for the 1.5.6? And, When you checked the logs, did you notice the log like

local_resource_id not found, tag xxxxx is assigned for local_resource_id
slewiskelly commented 4 years ago

@JeffLuoo, thanks for taking a look.

I am wondering is the configuration of 1.5.7 same as the one for the 1.5.6? And,

The configuration was the same between versions, until I updated the configuration described in https://github.com/fluent/fluent-bit/issues/2580#issuecomment-699906025.

When you checked the logs, did you notice the log like

local_resource_id not found, tag xxxxx is assigned for local_resource_id

I can't find any historical logs with that specific message.

For the most part, the only logs observed before crashes occurred were:

[error] error parsing local_resource_id for type k8s_container
[error] [output:stackdriver:stackdriver.0] fail to extract resource labels for k8s_container resource type
JeffLuoo commented 4 years ago

@slewiskelly I see. Thanks for the update! I just checked the code and found that the log message:

local_resource_id not found, tag xxxxx is assigned for local_resource_id

will only show up if the log level of fluent bit is set to "debug". And what this message means is that in your json message there is no field with the key:

logging.googleapis.com/local_resource_id

so it is going to use the tag value of the log to assign the value of local_resource_id. And local_resource_id is just the name of variable I used to assign the value of metadata in the final log for k8s_container resource type.

According to the error message:

[error] error parsing local_resource_id for type k8s_container

the error will be narrowed down to the function here: https://github.com/fluent/fluent-bit/blob/b4129df6eb8f88e1caeb6216f68134caea69c361/plugins/out_stackdriver/stackdriver.c#L348

I will add some information to the error message (to https://github.com/fluent/fluent-bit/blob/b4129df6eb8f88e1caeb6216f68134caea69c361/plugins/out_stackdriver/stackdriver.c#L385.) to include the local_resource_id that is passed in to this function. And this will help use better debug and to see whether the local_resource_id passed in is valid of not. Also we might need @slewiskelly to reproduce the error again to see what is the value of local_resource_id since I tested it locally but still didn't find the error. Thank you!

cc @erain: Hi Yu, I am wondering that have you seen this kind of error before? Since I don't have the access to deploy the Fluent Bit on gke environment. Thank you!

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.

rosmo commented 3 years ago

Just did an install on CentOS 8 and I'm encountering this issue, and seeing a similar segfault in stackdriver_format. Turned on debug log level and the only reasonably relevant line I can see is:

[2021/08/10 21:41:35] [debug] [output:stackdriver:stackdriver.1] [logging.googleapis.com/monitored_resource] not found in the payload

Stack trace (relevants parts, eg. no threads in epoll_wait):

                                                    Stack trace of thread 1410520:
                                                    #0  0x00007fcc7c23b65d __lll_lock_wait (libpthread.so.0)
                                                    #1  0x00007fcc7c234a44 __pthread_mutex_lock (libpthread.so.0)
                                                    #2  0x00007fcc7ab350a3 dl_iterate_phdr (libc.so.6)
                                                    #3  0x00007fcc7add6175 _Unwind_Find_FDE (libgcc_s.so.1)
                                                    #4  0x00007fcc7add2713 uw_frame_state_for (libgcc_s.so.1)
                                                    #5  0x00007fcc7add38f0 uw_init_context_1 (libgcc_s.so.1)
                                                    #6  0x00007fcc7add472c _Unwind_Backtrace (libgcc_s.so.1)
                                                    #7  0x0000000000434517 backtrace_full (fluent-bit)
                                                    #8  0x00000000004320bc flb_signal_handler (fluent-bit)
                                                    #9  0x00007fcc7aa35400 __restore_rt (libc.so.6)
                                                    #10 0x00000000004b26da stackdriver_format (fluent-bit)
                                                    #11 0x00000000004b4a09 cb_stackdriver_flush (fluent-bit)
                                                    #12 0x0000000000448cb8 output_pre_cb_flush (fluent-bit)
                                                    #13 0x0000000000690e87 co_init (fluent-bit)

Fluent-bit is installed from Google Cloud's Ops Agent (I don't think this is Google-specific though):

# /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit -V
Fluent Bit v1.7.8
rosmo commented 3 years ago

My bad, this was the result of using resource gce_instance, where I should have been using resource generic_node.

saidfarah commented 2 years ago

Did anyone find any fix for this?

kyontan commented 2 years ago

We faced same error in our environments. (on Kubernetes, fluent-bit v1.8.15)

Some crash logs (debug log enabled) ``` [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] task_id=0 assigned to thread #1 [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] task_id=1 assigned to thread #0 [2022/05/17 15:41:03] [debug] [upstream] KA connection #95 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:41:03] [debug] [upstream] KA connection #94 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:41:03] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:41:03] [debug] [http_client] not using http_proxy for header [2022/05/17 15:41:03] [debug] [http_client] not using http_proxy for header [2022/05/17 15:41:04] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:41:04] [engine] caught signal (SIGSEGV) [2022/05/17 15:41:04] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:41:04] [engine] caught signal (SIGSEGV) #0 0x559f19beb542 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88 #1 0x559f19beb56d in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93 #2 0x559f19beb7ee in flb_kv_item_destroy() at src/flb_kv.c:90 #3 0x559f19beb843 in flb_kv_release() at src/flb_kv.c:102 #4 0x559f19bf1efa in http_headers_destroy() at src/flb_http_client.c:1002 #5 0x559f19bf2953 in flb_http_client_destroy() at src/flb_http_client.c:1328 #6 0x559f19c7676d in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2323 #7 0x559f19bbec0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #8 0x559f1a0c2506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #9 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-47xwz (fluent-bit) [2022/05/17 15:41:48] [debug] [task] destroy task=0x7f9edde39a80 (task_id=2) [2022/05/17 15:41:48] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:41:48] [debug] [upstream] KA connection #107 to logging.googleapis.com:443 is now available [2022/05/17 15:41:48] [debug] [out coro] cb_destroy coro_id=25 [2022/05/17 15:41:48] [debug] [task] destroy task=0x7f9edde3a660 (task_id=0) [2022/05/17 15:41:48] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:41:48] [debug] [upstream] KA connection #105 to logging.googleapis.com:443 is now available [2022/05/17 15:41:48] [debug] [out coro] cb_destroy coro_id=28 [2022/05/17 15:41:48] [debug] [task] destroy task=0x7f9edde394e0 (task_id=6) [2022/05/17 15:41:48] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:41:48] [debug] [upstream] KA connection #106 to logging.googleapis.com:443 is now available [2022/05/17 15:41:48] [debug] [out coro] cb_destroy coro_id=29 [2022/05/17 15:41:48] [debug] [task] destroy task=0x7f9edde39760 (task_id=8) #0 0x55afb6296503 in ???() at lib/monkey/deps/flb_libco/libco.c:0 Stream closed EOF for fluent-bit/fluent-bit-hq5bj (fluent-bit) [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:43:29] [debug] [http_client] not using http_proxy for header [2022/05/17 15:43:29] [debug] [http_client] not using http_proxy for header [2022/05/17 15:43:29] [debug] [upstream] KA connection #110 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:43:29] [debug] [upstream] KA connection #103 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:43:29] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:43:29] [ warn] [msgpack2json] unknown msgpack type -2082191991 [2022/05/17 15:43:29] [engine] caught signal (SIGSEGV) Stream closed EOF for fluent-bit/fluent-bit-mglgj (fluent-bit) [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:44:20] [debug] [http_client] not using http_proxy for header [2022/05/17 15:44:20] [debug] [http_client] not using http_proxy for header [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:44:20] [debug] [upstream] KA connection #92 to logging.googleapis.com:443 is now available [2022/05/17 15:44:20] [debug] [out coro] cb_destroy coro_id=24 [2022/05/17 15:44:20] [debug] [task] destroy task=0x7f3c62a39120 (task_id=0) [2022/05/17 15:44:20] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:44:20] [engine] caught signal (SIGSEGV) #0 0x55c1b3c80542 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88 #1 0x55c1b3c8056d in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93 #2 0x55c1b3c807ee in flb_kv_item_destroy() at src/flb_kv.c:90 #3 0x55c1b3c80843 in flb_kv_release() at src/flb_kv.c:102 #4 0x55c1b3c86efa in http_headers_destroy() at src/flb_http_client.c:1002 #5 0x55c1b3c87953 in flb_http_client_destroy() at src/flb_http_client.c:1328 #6 0x55c1b3d0b76d in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2323 #7 0x55c1b3c53c0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #8 0x55c1b4157506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #9 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-q6bc2 (fluent-bit) [2022/05/17 15:45:53] [debug] [retry] new retry created for task_id=9 attempts=1 [2022/05/17 15:45:53] [ warn] [engine] failed to flush chunk '1-1652802350.488423802.flb', retry in 9 seconds: task_id=9, input=tail.0 > output=stackdriver.0 (out_id=0) [2022/05/17 15:45:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:45:54] [engine] caught signal (SIGSEGV) [2022/05/17 15:45:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:45:54] [debug] [upstream] KA connection #142 to logging.googleapis.com:443 is now available [2022/05/17 15:45:54] [debug] [out coro] cb_destroy coro_id=256 [2022/05/17 15:45:54] [debug] [task] destroy task=0x7f4d0631fbe0 (task_id=10) [2022/05/17 15:45:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:45:54] [debug] [upstream] KA connection #147 to logging.googleapis.com:443 is now available [2022/05/17 15:45:54] [debug] [out coro] cb_destroy coro_id=255 [2022/05/17 15:45:54] [debug] [task] destroy task=0x7f4d0631faa0 (task_id=8) [2022/05/17 15:45:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:45:54] [engine] caught signal (SIGSEGV) [2022/05/17 15:45:54] [debug] [input:tail:tail.0] inode=370431827 events: IN_MODIFY [2022/05/17 15:45:54] [debug] [input chunk] update output instances with new chunk size diff=919 [2022/05/17 15:45:54] [debug] [filter:kubernetes:kubernetes.0] could not merge JSON, root_type=2 [2022/05/17 15:45:54] [debug] [input chunk] update output instances with new chunk size diff=927 [2022/05/17 15:45:54] [debug] [filter:kubernetes:kubernetes.0] could not merge JSON, root_type=2 [2022/05/17 15:45:54] [debug] [input chunk] update output instances with new chunk size diff=928 [2022/05/17 15:45:54] [debug] [input:tail:tail.0] inode=370431827 events: IN_MODIFY #0 0x55dca7e37542 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88 #1 0x55dca7e3756d in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93 #2 0x55dca7e377ee in flb_kv_item_destroy() at src/flb_kv.c:90 #3 0x55dca7e37843 in flb_kv_release() at src/flb_kv.c:102 #4 0x55dca7e3defa in http_headers_destroy() at src/flb_http_client.c:1002 #5 0x55dca7e3e953 in flb_http_client_destroy() at src/flb_http_client.c:1328 #6 0x55dca7ec276d in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2323 #7 0x55dca7e0ac0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #8 0x55dca830e506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #9 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-sdszh (fluent-bit) [2022/05/17 15:46:10] [debug] [out coro] cb_destroy coro_id=33 [2022/05/17 15:46:10] [debug] [task] destroy task=0x7f61f423be20 (task_id=3) [2022/05/17 15:46:10] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:46:10] [debug] [upstream] KA connection #140 to logging.googleapis.com:443 is now available [2022/05/17 15:46:10] [debug] [out coro] cb_destroy coro_id=32 [2022/05/17 15:46:10] [debug] [task] destroy task=0x7f61f423a980 (task_id=2) [2022/05/17 15:46:10] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:46:10] [engine] caught signal (SIGSEGV) #0 0x55650b731542 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:88 #1 0x55650b73156d in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93 #2 0x55650b7317ee in flb_kv_item_destroy() at src/flb_kv.c:90 #3 0x55650b731843 in flb_kv_release() at src/flb_kv.c:102 #4 0x55650b737efa in http_headers_destroy() at src/flb_http_client.c:1002 #5 0x55650b738953 in flb_http_client_destroy() at src/flb_http_client.c:1328 #6 0x55650b7bc76d in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2323 #7 0x55650b704c0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #8 0x55650bc08506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #9 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-vmq2l (fluent-bit) [2022/05/17 15:42:32] [ info] [input:tail:tail.0] inotify_fs_add(): inode=373424683 watch_fd=26 name=/var/log/containers/xxx.log [2022/05/17 15:42:32] [debug] [input:tail:tail.0] inode=316133766 file=/var/log/containers/xxx.log promote to TAIL_EVENT [2022/05/17 15:42:32] [ info] [input:tail:tail.0] inotify_fs_add(): inode=316133766 watch_fd=27 name=/var/log/containers/xxx.log [2022/05/17 15:42:32] [debug] [input:tail:tail.0] inode=245660736 file=/var/log/containers/xxx.log promote to TAIL_EVENT [2022/05/17 15:42:32] [ info] [input:tail:tail.0] inotify_fs_add(): inode=245660736 watch_fd=28 name=/var/log/containers/xxx.log [2022/05/17 15:42:32] [debug] [input:tail:tail.0] inode=279119756 file=/var/log/containers/xxx.log promote to TAIL_EVENT [2022/05/17 15:42:32] [ info] [input:tail:tail.0] inotify_fs_add(): inode=279119756 watch_fd=29 name=/var/log/containers/xxx.log [2022/05/17 15:42:32] [debug] [input:tail:tail.0] [static files] processed 0b, done [2022/05/17 15:42:32] [debug] [task] destroy task=0x7f172de3ab60 (task_id=5) [2022/05/17 15:42:32] [debug] [task] destroy task=0x7f172de3a340 (task_id=6) #0 0x558a4b4fc1c4 in flb_output_return() at include/fluent-bit/flb_output.h:633 #1 0x558a4b4fc29b in flb_output_return_do() at include/fluent-bit/flb_output.h:688 #2 0x558a4b502783 in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2327 #3 0x558a4b44ac0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #4 0x558a4b94e506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #5 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-vpb47 (fluent-bit) [2022/05/17 15:46:53] [debug] [output:stackdriver:stackdriver.0] task_id=1 assigned to thread #1 [2022/05/17 15:46:53] [debug] [upstream] KA connection #97 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:46:53] [debug] [upstream] KA connection #94 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:46:53] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:46:53] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:46:53] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:46:53] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:46:53] [debug] [http_client] not using http_proxy for header [2022/05/17 15:46:53] [debug] [http_client] not using http_proxy for header [2022/05/17 15:46:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:46:54] [debug] [upstream] KA connection #94 to logging.googleapis.com:443 is now available [2022/05/17 15:46:54] [debug] [out coro] cb_destroy coro_id=24 [2022/05/17 15:46:54] [debug] [task] destroy task=0x7ff472039b20 (task_id=1) [2022/05/17 15:46:54] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:46:54] [debug] [upstream] KA connection #97 to logging.googleapis.com:443 is now available [2022/05/17 15:46:54] [engine] caught signal (SIGSEGV) #0 0x55df7afa01c4 in flb_output_return() at include/fluent-bit/flb_output.h:633 #1 0x55df7afa029b in flb_output_return_do() at include/fluent-bit/flb_output.h:688 #2 0x55df7afa6783 in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2327 #3 0x55df7aeeec0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #4 0x55df7b3f2506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #5 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-9vx5b (fluent-bit) [2022/05/17 15:47:46] [engine] caught signal (SIGSEGV) [2022/05/17 15:47:46] [debug] [upstream] KA connection #109 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:47:46] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:47:46] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:47:46] [debug] [http_client] not using http_proxy for header [2022/05/17 15:47:47] [debug] [output:stackdriver:stackdriver.0] HTTP Status=200 [2022/05/17 15:47:47] [debug] [upstream] KA connection #109 to logging.googleapis.com:443 is now available [2022/05/17 15:47:47] [debug] [out coro] cb_destroy coro_id=6 [2022/05/17 15:47:47] [debug] [task] destroy task=0x7fcef723a0c0 (task_id=2) #0 0x561badd8f066 in flb_sds_len() at include/fluent-bit/flb_sds.h:51 #1 0x561badd90c4e in http_header_push() at src/flb_http_client.c:930 #2 0x561badd90e7f in http_headers_compose() at src/flb_http_client.c:990 #3 0x561badd91221 in flb_http_do() at src/flb_http_client.c:1128 #4 0x561bade15373 in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2265 #5 0x561badd5dc0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #6 0x561bae261506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #7 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-6zmjj (fluent-bit) assigned (recycled) [2022/05/17 15:44:23] [debug] [upstream] KA connection #103 to logging.googleapis.com:443 has been assigned (recycled) [2022/05/17 15:44:23] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:44:23] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:44:23] [debug] [output:stackdriver:stackdriver.0] local_resource_id not found, tag [kube.applications.var.log.containers.xxxxx is assigned for local_resource_id [2022/05/17 15:44:23] [debug] [output:stackdriver:stackdriver.0] [logging.googleapis.com/monitored_resource] not found in the payload [2022/05/17 15:44:23] [debug] [http_client] not using http_proxy for header [2022/05/17 15:44:23] [debug] [http_client] not using http_proxy for header [2022/05/17 15:44:23] [engine] caught signal (SIGSEGV) #0 0x7f3857f32633 in ???() at ???:0 #1 0x7f3857f34d4b in ???() at ???:0 #2 0x7f3857f35ae4 in ???() at ???:0 #3 0x7f3857f48872 in ???() at ???:0 #4 0x55a4a1982771 in tls_net_write() at src/tls/openssl.c:436 #5 0x55a4a1982dd8 in flb_tls_net_write_async() at src/tls/flb_tls.c:266 #6 0x55a4a198de05 in flb_io_net_write() at src/flb_io.c:361 #7 0x55a4a199031b in flb_http_do() at src/flb_http_client.c:1160 #8 0x55a4a1a14373 in cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2265 #9 0x55a4a195cc0e in output_pre_cb_flush() at include/fluent-bit/flb_output.h:517 #10 0x55a4a1e60506 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117 #11 0xffffffffffffffff in ???() at ???:0 Stream closed EOF for fluent-bit/fluent-bit-5ks4f (fluent-bit) ```