fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.77k stars 1.57k forks source link

[output:s3:s3.0] PutObject request failed #8946

Closed vi4vikas closed 6 days ago

vi4vikas commented 3 months ago

Bug Report

I am getting the following error intermittently in my Fluent bit Logs, due to which it is failing to upload the log files to the S3 bucket. Full error log:

[2024/06/12 09:38:19] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:19] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:19] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:38:19] [error] [output:s3:s3.0] Could not send chunk with tag kube-system
[2024/06/12 09:38:20] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:20] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:20] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:38:50] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:50] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:38:50] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:39:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:39:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:39:10] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:39:30] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:39:30] [error] [tls] error: error:00000006:lib(0):func(0):EVP lib
[2024/06/12 09:39:30] [error] [/src/fluent-bit/src/flb_http_client.c:1231 errno=32] Broken pipe
[2024/06/12 09:39:30] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:40:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:40:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:40:10] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:40:30] [error] [/src/fluent-bit/src/tls/openssl.c:495 errno=32] Broken pipe
[2024/06/12 09:40:30] [error] [tls] syscall error: error:00000005:lib(0):func(0):DH lib
[2024/06/12 09:40:30] [error] [/src/fluent-bit/src/flb_http_client.c:1241 errno=32] Broken pipe
[2024/06/12 09:41:20] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:41:20] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:41:20] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:42:00] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:42:00] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:42:00] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:43:00] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:10] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:10] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:43:59] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:59] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:59] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:43:59] [error] [output:s3:s3.0] Could not send chunk with tag 105250-borrow
[2024/06/12 09:43:59] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:59] [error] [http_client] broken connection to s3.eu-west-1.amazonaws.com:443 ?
[2024/06/12 09:43:59] [error] [output:s3:s3.0] PutObject request failed
[2024/06/12 09:43:59] [error] [output:s3:s3.0] Could not send chunk with tag logging
[2024/06/12 09:44:00] [error] [tls] error: error:00000006:lib(0):func(0):EVP lib
[2024/06/12 09:44:00] [error] [/src/fluent-bit/src/flb_http_client.c:1241 errno=104] Connection reset by peer
[2024/06/12 09:44:00] [error] [tls] error: error:00000001:lib(0):func(0):reason(1)
[2024/06/12 09:44:00] [error] [output:s3:s3.0] PutObject request failed

I am using Fluent-bit version 2.2.0 and is configured as:

Fluent-Bit configurations:

  config:
    service: |
      [SERVICE]
        Flush                               10
        Log_Level                           info
        Daemon                              off
        HTTP_Server                         On
        HTTP_Listen                         0.0.0.0
        HTTP_PORT                           2020
        Health_Check                        On
        storage.path                        /var/log/fluent-bit-buffer
        storage.sync                        full
        storage.metrics                     on
        storage.delete_irrecoverable_chunks on
        storage.max_chunks_up               1000
        storage.backlog.mem_limit           300Mi

    ## https://docs.fluentbit.io/manual/pipeline/inputs
    inputs: |
      [INPUT]
        Name              tail
        Tag               kube.*
        storage.type      filesystem
        Path              /var/log/containers/*.log
        multiline.parser  cri, docker
        DB                /var/log/flb_kube.db
        DB.locking        true
        Buffer_Chunk_Size 1MB
        Buffer_Max_Size   1MB
        Skip_Long_Lines   On
        Refresh_Interval  5

    ## https://docs.fluentbit.io/manual/pipeline/filters
    filters: |
      [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /<path_to_those_files>
        Kube_Token_File     /<path_to_those_files>
        Merge_Log           On
        Merge_Log_Key       log4j
        Cache_Use_Docker_Id On
        Keep_Log            On
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Labels              On
        Annotations         Off
        Buffer_Size         0

      [FILTER]
        Name                 rewrite_tag
        Match                kube.*
        Rule                 $kubernetes['namespace_name'] ^.*$ $kubernetes['namespace_name'] false
        Emitter_Name         ns_emitter
        Emitter_Storage.type filesystem    

    ## https://docs.fluentbit.io/manual/pipeline/outputs
    outputs: |
      [OUTPUT]
        Name                         s3
        Match                        *
        bucket                       <bucket_name>
        region                       eu-west-1
        upload_timeout               1m
        total_file_size              5M
        s3_key_format                /account_name/%Y/%m/%d/%H/$TAG/$UUID.gz
        s3_key_format_tag_delimiters .-
        canned_acl                   bucket-owner-full-control
        use_put_object               true
        compression                  gzip
        store_dir_limit_size         10G
        retry_limit                  5

Resources allocated to Fluent-Bit pods:

  resources:
    limits:
      memory: 300Mi
    requests:
      cpu: 400m
      memory: 300Mi

I have recently set the retry_limit to 5 as it was set to 1 by default

Edit: The update of retry_limit didn't help. Edit: Cannot attach the snapshot of the metrics here, but the improvement seen is around 90%

It's not uniform but each fluent-bit pod opens around 1200 files in an hour (output of fluentbit_input_files_opened_total{})

The infrastructure is not broken, I can see most of the logs. But it's just that a few logs are missed while Fluent-bit suffers from the issue. I saw the same issue raised in the past, I cannot see anyone being concluded to a solution or a fix. Is there something I need to configure or is it an ongoing issue?

Please let me know if there is any more information required for the resolution.

patrick-stephens commented 3 months ago

Can you try with the latest version 3.0.7?

vi4vikas commented 3 months ago

Sure can! However, I couldn't see any fix in the later version for the S3 plugin.

patrick-stephens commented 3 months ago

It may be TLS related though looking at the stack, plus always best to try with the latest version anyway to confirm.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 6 days ago

This issue was closed because it has been stalled for 5 days with no activity.