aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
461 stars 134 forks source link

Dropped Logs #701

Closed woolcapgithub closed 11 months ago

woolcapgithub commented 1 year ago
### Describe the question/issue If we log a string (goes to message of message) of more than 10,000 chars we do not see the message. If we log an array (goes to context of message) with somewhere around 12,000 chars, we do not see the message. ### Configuration CloudFromation script for fireless Name: log_router Environment: - Name: FLB_LOG_LEVEL Value: info - Name: aws_fluent_bit_init_s3_1 Value: Fn::Join: - "" - - Fn::ImportValue: !Sub ${EFSStackName}-S3FirelensConfigBucketArn - /faas_parser.conf - Name: aws_fluent_bit_init_s3_2 Value: Fn::Join: - "" - - Fn::ImportValue: !Sub ${EFSStackName}-S3FirelensConfigBucketArn - /custom-fluent-bit.conf FirelensConfiguration: Type: fluentbit MemoryReservation: 100 LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref FirelensLogGroup awslogs-region: !Ref AWS::Region awslogs-stream-prefix: nginx - Essential: true Image: !Sub ${AWS::AccountId}.dkr.ecr.${AWS::Region}.${AWS::URLSuffix}/faas/nginx:latest LogConfiguration: LogDriver: awsfirelens Options: auto_create_group: true log-driver-buffer-limit : 536870911 log_group_name: !Ref NginxLogGroup log_stream_prefix: nginx log_stream_template: application.$extra['app_name'].$extra['tenant_name'] Name: cloudwatch_logs region: !Ref AWS::Region Name: nginx PortMappings: - ContainerPort: 80 Protocol: tcp MountPoints: - SourceVolume: apps-dir ContainerPath: "/usr/share/nginx/www" - SourceVolume: logs-dir ContainerPath: "/var/log/intacct" - SourceVolume: tmp-dir ContainerPath: "/tmp"

Fluent Bit Log Output

No output with log messages

Fluent Bit Version Info

Latest

Cluster Details

Using ECS setup with nginx tasks that send traffic to app tasks which runs code from EFS. Logs are pushed to CloudWatch so not to fill up EFS

Application Details

Firelens runs on our ECS tasks and places our application logs into CloudWatch.

Below is the confirmation code: custom-fluent-bit.conf

[FILTER] Name parser Match * Key_Name log Parser phpfpm_with_time Reserve_Data True

[FILTER] Name parser Match * Key_Name log Parser phpfpm_without_time Reserve_Data True

[FILTER] Name record_modifier Match * Remove_key prefix

[FILTER] Name parser Match * Key_Name raw_message Parser json Reserve_Data True

[FILTER] Name parser Match * Key_Name log Parser json Reserve_Data True

faas_parser.conf

[PARSER] Name phpfpm_with_time Format regex Regex ^[(?.)UTC]\s(?.)$ Time_Key app_timestamp Time_Format %d-%b-%Y %H:%M:%S

[PARSER] Name phpfpm_without_time Format regex Regex ^(?NOTICE: PHP message:)\s(?.*)$

[PARSER] Name json Format json

Steps to reproduce issue

Here is code I am using to test.

        $str = '';
        $length = 20000;
        $charset = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
        $count = strlen($charset);
        while ($length--) {
                $str .= $charset[mt_rand(0, $count-1)];
        }
        dbg("SEE BELOW FOR LARGE STRING");
        dbg($str);
        dbg("SEE ABOVE FOR LARGE STRING");
        dbg("SEE BELOW FOR LARGE ARRAY");
        dbg(array("RANDOM" => array ( "ACTUALSTRING" => $str)));
        dbg("SEE ABOVE FOR LARGE ARRAY");

Related Issues

PettitWesley commented 1 year ago

I know that container runtimes, including Docker will split log lines from stdout at some value, usually 16KB. I'm not sure if that's somehow related here or not. I'm also wondering if there may be some limit related to PHP here- @woolcapgithub if you run your code on the command line does it spit out all that text to stdout?

We'll try to reproduce this on an instance and get back to you.

@DrewZhang13 please if you have time try to repro this: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#tutorial-replicate-an-ecs-firelens-task-setup-locally

We have this for sending large log lines: https://github.com/aws/aws-for-fluent-bit/tree/mainline/troubleshooting/tools/big-rand-logger

jpc-bakertilly commented 1 year ago

Thanks, Wesley. BTW, that dbg() above just ends up doing the error_log call via monolog in the end that gets things to Cloudwatch.

jpc-bakertilly commented 1 year ago

The test would not go to stdout of the console session as the logs are directed to the container stdout which is under /proc somewhere.

If I try error_log normally from CLI I can output many more chars that the limit we are seeing, so I don't think it's an error_log limit by itself.

FWIW, I did try this code locally in a running container and I can see the large payloads in my shell output and also the container STDOUT. I tried 20,000 chars. This size fails with Firelens.

So, I think we've eliminated PHP error_log limit.

PettitWesley commented 1 year ago

CloudWatch has a max message size of 256 KiB. Some of the other FLB plugins like tail and TCP have a max event size they can ingest; forward sets a huge default value for buffer_max_size: https://docs.fluentbit.io/manual/pipeline/inputs/forward

So none of those explanations should be able to be the cause of this.

jpc-bakertilly commented 1 year ago

@PettitWesley Let me know if you are waiting on us for something here. Thanks!

PettitWesley commented 1 year ago

@DrewZhang13 is taking a look at this

Abhinab-AY commented 1 year ago

@DrewZhang13 did we ever figure this out ? what might be the reason behind this ? as we are having 13000+ chars in log and its failing for us.

woolcapgithub commented 1 year ago

We figured it out. Had to update the config file for firelen to allow larger log files.

[INPUT] Name forward unix_path /var/run/fluent.sock Mem_Buf_Limit 510MB

Abhinab-AY commented 1 year ago

I do not use Input in my fluent config.. Its reading stdout from the php application containers on the ECS service. How do I achieve this ?

[SERVICE] Daemon Off Flush 1 Log_Level warn Parsers_File custom-parsers.conf Streams_File stream_processing.conf [FILTER] Name modify Match Add system_name application Rename log log_processed [FILTER] Name parser Match Key_Name log_processed Parser json_parser Preserve_Key On Reserve_Data True [OUTPUT] Name s3 Match logs.cloudwatch region ${AWS_REGION} bucket ${BUCKET_NAME} s3_key_format /${AWS_ACCOUNT_ID}/${AWS_REGION}/${CLUSTER}/%Y/%m/%d/%H/$UUID.json canned_acl bucket-owner-full-control auto_retry_requests true upload_timeout ${UPLOAD_TIMEOUT} total_file_size ${TOTAL_FILE_SIZE} use_put_object On [OUTPUT] Name datadog Match logs.datadog Host http-intake.logs.datadoghq.eu TLS on compress gzip apikey ${DD_API_KEY} dd_service ${DD_SERVICE} dd_source ${SOURCE} dd_message_key log dd_tags ${DD_TAG_LIST}

woolcapgithub commented 1 year ago

Firelen I believe defaults to the lower buffer limit. Add the [INPUT] section to file

[INPUT] Name forward unix_path /var/run/fluent.sock Mem_Buf_Limit 510MB [SERVICE] Daemon Off Flush 1 Log_Level warn Parsers_File custom-parsers.conf Streams_File stream_processing.conf [FILTER] Name modify Match Add system_name application Rename log log_processed [FILTER] Name parser Match Key_Name log_processed Parser json_parser Preserve_Key On Reserve_Data True [OUTPUT] Name s3 Match logs.cloudwatch region ${AWS_REGION} bucket ${BUCKET_NAME} s3_key_format /${AWS_ACCOUNT_ID}/${AWS_REGION}/${CLUSTER}/%Y/%m/%d/%H/$UUID.json canned_acl bucket-owner-full-control auto_retry_requests true upload_timeout ${UPLOAD_TIMEOUT} total_file_size ${TOTAL_FILE_SIZE} use_put_object On [OUTPUT] Name datadog Match logs.datadog Host http-intake.logs.datadoghq.eu TLS on compress gzip apikey ${DD_API_KEY} dd_service ${DD_SERVICE} dd_source ${SOURCE} dd_message_key log dd_tags ${DD_TAG_LIST}

Abhinab-AY commented 1 year ago

Thanks for the response. My understanding says with INPUT is more for where to get the logs isn't it ?

  1. Do I need to change the app in any way ?
  2. Do I need to share the sock file across containers in the ECS service.
  3. Would it still read the STDOUT ? Thanks again for the help here !!
woolcapgithub commented 1 year ago

Sure, I believe the INPUT section only applies to the buffer limit and doesn't change anything related to STDOUT

Give it a try and see if you get more than 13000+ chars in CloudWatch

woolcapgithub commented 11 months ago

We seem to be having the same issue again. The config file for fluent-bit hasn't changed, but it appears logs are dropped when they hit the default lower limit for Men_Buf_Limit.

contents of custom-fluent-bit.conf [INPUT] Name forward unix_path /var/run/fluent.sock Mem_Buf_Limit 510MB

[FILTER] Name parser Match * Key_Name log Parser phpfpm_with_time Reserve_Data True

[FILTER] Name parser Match * Key_Name log Parser phpfpm_without_time Reserve_Data True

[FILTER] Name record_modifier Match * Remove_key prefix

[FILTER] Name parser Match * Key_Name raw_message Parser json Reserve_Data True

[FILTER] Name parser Match * Key_Name log Parser json Reserve_Data True

woolcapgithub commented 11 months ago

Update. Figured out how to increase the memory limits for the app Task Definition: "memoryReservation": 1024

Also figured out how to get fluent bit to cat long string logs together so they show up in the application's log stream.

[FILTER] name multiline match * multiline.key_content log

partial_message mode is incompatible with option multiline.parser

mode                  partial_message

Reference: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/filter-multiline-partial-message-mode