aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
459 stars 134 forks source link

CloudWatch Logs recycling connections and failing to log intermittently #292

Open sjacksonBP opened 2 years ago

sjacksonBP commented 2 years ago

cloudwatch_logs constantly recycling connections then will stop logging,

Usually fluent-bit will log for a few minutes then I'll stop receiving logs in cloudwatch logs, the service status will say td-agent-bit is still running, restarting docker containers or td-agent-bit does not seem to help.

We are logging java microservices on ECS containers that are hosted on EC2.



    # Flush
    # =====
    # set an interval of seconds before to flush records to a destination
    flush        5

    # Daemon
    # ======
    # instruct Fluent Bit to run in foreground or background mode.
    daemon       Off

    # Log_Level
    # =========
    # Set the verbosity level of the service, values can be:
    #
    # - error
    # - warning
    # - info
    # - debug
    # - trace
    #
    # by default 'info' is set, that means it includes 'error' and 'warning'.
    log_level   debug

    # Parsers File
    # ============
    # specify an optional 'Parsers' configuration file
    parsers_file parsers.conf

    # Plugins File
    # ============
    # specify an optional 'Plugins' configuration file to load external plugins.
    plugins_file plugins.conf

    # HTTP Server
    # ===========
    # Enable/Disable the built-in HTTP Server for metrics
    #http_server  Off
    #http_listen  0.0.0.0
    #http_port    2020

    # Storage
    # =======
    # Fluent Bit can use memory and filesystem buffering based mechanisms
    #
    # - https://docs.fluentbit.io/manual/administration/buffering-and-storage
    #
    # storage metrics
    # ---------------
    # publish storage pipeline metrics in '/api/v1/storage'. The metrics are
    # exported only if the 'http_server' option is enabled.
    #
    storage.metrics on

    # storage.path
    # ------------
    # absolute file system path to store filesystem data buffers (chunks).
    #
    # storage.path /tmp/storage

    # storage.sync
    # ------------
    # configure the synchronization mode used to store the data into the
    # filesystem. It can take the values normal or full.
    #
    # storage.sync normal

    # storage.checksum
    # ----------------
    # enable the data integrity check when writing and reading data from the
    # filesystem. The storage layer uses the CRC32 algorithm.
    #
    # storage.checksum off

    # storage.backlog.mem_limit
    # -------------------------
    # if storage.path is set, Fluent Bit will look for data chunks that were
    # not delivered and are still in the storage layer, these are called
    # backlog data. This option configure a hint of maximum value of memory
    # to use when processing these records.
    #
    # storage.backlog.mem_limit 5M

[INPUT]
    name                       forward
    listen                        0.0.0.0
    port                          24224

[OUTPUT]
    name                        cloudwatch_logs
    match                       *
    log_group_name      microservices/ecs/fluent-bit/
    log_stream_prefix     service-
    region                       us-east-1
    log_retention_days   1
    auto_create_group    on

---

td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group microservices/ecs/fluent-bit/
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] not using http_proxy for header
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] server logs.us-east-1.amazonaws.com:443 will close connection #27
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_client] logs.us-east-1.amazonaws.com: http_do=0, HTTP Status: 400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] CreateLogGroup http status=400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Log Group microservices/ecs/fluent-bit/ already exists
 td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Setting retention policy on log group microservices/ecs/fluent-bit/ to 1d
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] not using http_proxy for header
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [upstream] KA connection #27 to logs.us-east-1.amazonaws.com:443 is now available
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutRetentionPolicy http status=200
 td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Set retention policy to 1
 td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream service-stg-serv-ms-behv in log group microservices/ecs/fluent-bit/
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [upstream] KA connection #27 to logs.us-east-1.amazonaws.com:443 has been assigned (recycled)
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] not using http_proxy for header
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] server logs.us-east-1.amazonaws.com:443 will close connection #27
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_client] logs.us-east-1.amazonaws.com: http_do=0, HTTP Status: 400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] CreateLogStream http status=400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Log Stream service-stg-serv-ms-behv already exists
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 1 events
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] not using http_proxy for header
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] server logs.us-east-1.amazonaws.com:443 will close connection #27
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_client] logs.us-east-1.amazonaws.com: http_do=0, HTTP Status: 400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=400
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sequence token was invalid, will retry
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 1 events
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [http_client] not using http_proxy for header
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [upstream] KA connection #27 to logs.us-east-1.amazonaws.com:443 is now available
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to service-stg-serv-ms-behv
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 1 events to CloudWatch
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [out coro] cb_destroy coro_id=0
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [task] destroy task=0x7fa52873c620 (task_id=0)
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [upstream] KA connection #27 to logs.us-east-1.amazonaws.com:443 has been disconnected by the remote service
 td-agent-bit[19151]: [2022/02/03 17:03:55] [debug] [socket] could not validate socket status for #27 (don't worry)
 td-agent-bit[19151]: [2022/02/03 17:04:04] [debug] [input chunk] update output instances with new chunk size diff=469

td-agent-bit[19151]: [2022/02/03 17:10:26] [debug] [input chunk] update output instances with new chunk size diff=656
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [task] created task=0x7fa52873d020 id=0 OK
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 108 events
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been assigned (recycled)
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [http_client] not using http_proxy for header
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 is now available
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 108 events to CloudWatch
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [out coro] cb_destroy coro_id=17
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [task] destroy task=0x7fa52873d020 (task_id=0)
td-agent-bit[19151]: [2022/02/03 17:10:38] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been disconnected by the remote service
td-agent-bit[19151]: [2022/02/03 17:10:38] [debug] [socket] could not validate socket status for #28 (don't worry)
td-agent-bit[19151]: [2022/02/03 17:11:13] [debug] [input chunk] update output instances with new chunk size diff=335
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [task] created task=0x7fa52873cf80 id=0 OK
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 1 events
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [http_client] not using http_proxy for header
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 is now available
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 1 events to CloudWatch
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [out coro] cb_destroy coro_id=18
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [task] destroy task=0x7fa52873cf80 (task_id=0)
td-agent-bit[19151]: [2022/02/03 17:11:23] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been disconnected by the remote service
td-agent-bit[19151]: [2022/02/03 17:11:23] [debug] [socket] could not validate socket status for #28 (don't worry)Feb 03 17:10:26 ip-10-2-50-126.ec2.internal td-agent-bit[19151]: [2022/02/03 17:10:26] [debug] [input chunk] update output instances with new chunk size diff=656
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [task] created task=0x7fa52873d020 id=0 OK
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 108 events
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been assigned (recycled)
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [http_client] not using http_proxy for header
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 is now available
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 108 events to CloudWatch
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [out coro] cb_destroy coro_id=17
td-agent-bit[19151]: [2022/02/03 17:10:30] [debug] [task] destroy task=0x7fa52873d020 (task_id=0)
td-agent-bit[19151]: [2022/02/03 17:10:38] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been disconnected by the remote service
td-agent-bit[19151]: [2022/02/03 17:10:38] [debug] [socket] could not validate socket status for #28 (don't worry)
td-agent-bit[19151]: [2022/02/03 17:11:13] [debug] [input chunk] update output instances with new chunk size diff=335
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [task] created task=0x7fa52873cf80 id=0 OK
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending 1 events
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [http_client] not using http_proxy for header
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 is now available
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to service-stg-serv-ms-behv
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 1 events to CloudWatch
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [out coro] cb_destroy coro_id=18
td-agent-bit[19151]: [2022/02/03 17:11:15] [debug] [task] destroy task=0x7fa52873cf80 (task_id=0)
td-agent-bit[19151]: [2022/02/03 17:11:23] [debug] [upstream] KA connection #28 to logs.us-east-1.amazonaws.com:443 has been disconnected by the remote service
td-agent-bit[19151]: [2022/02/03 17:11:23] [debug] [socket] could not validate socket status for #28 (don't worry)```
PettitWesley commented 2 years ago

cloudwatch_logs constantly recycling connections then will stop logging,

So the log output you showed here looks pretty normal... its normal to get all sorts of these connection messages if you enable debug logging. Do you see any errors?

Usually fluent-bit will log for a few minutes then I'll stop receiving logs in cloudwatch logs, the service status will say td-agent-bit is still running, restarting docker containers or td-agent-bit does not seem to help.

You need to prove to us that this is happening.

I'd recommend doing one of the following to show that Fluent Bit is still ingesting records but not sending them anywhere:

  1. Enable Fluent Bit stdout output so you can see the records its collecting and should be sending to CW: https://docs.fluentbit.io/manual/pipeline/outputs/standard-output
  2. Enable the monitoring interface and curl it to see what FLuent Bit is ingesting and sending: https://docs.fluentbit.io/manual/administration/monitoring#http-server