awslabs / landing-zone-accelerator-on-aws

Deploy a multi-account cloud foundation to support highly-regulated workloads and complex compliance requirements.
https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/
Apache License 2.0
535 stars 425 forks source link

Log centralization lambda function fails #145

Closed RyanGhavidel closed 11 months ago

RyanGhavidel commented 1 year ago

Describe the bug There's a Lambda function in Logging accounts which processes log groups and store them in the central log S3 Bucket. It has started to fail recently. We have an AWS support case where they believe it exceeds the 6MB quota.

"""The error message that is occurring in the Lambda function suggests the function is attempting to return a response whose size exceeds the maximum payload size limit of 6 MB. Kinesis Data Firehose invokes Lambda synchronously and therefore, as is the case for all asynchronous invocations, the maximum payload size limit is 6 MB. Increasing the payload size limit is unfortunately not allowed"""

It's not yet clear if support finding is correct. But, after a ling list of log groups, this message appears in Lambda function logs. "[ERROR] [1683579065572] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413. "

image

Expected behavior As a result of this we are missing logs in the central log bucket.

To Reproduce We don't know how to reproduce it. If support investigation is correct, It means we are having a massive payload for the function

Please complete the following information about the solution:

To get the version of the solution, you can look at the description of the created CloudFormation stack. For example, "(SO0021) - Video On Demand workflow with AWS Step Functions, MediaConvert, MediaPackage, S3, CloudFront and DynamoDB. Version v5.0.0". If the description does not contain the version information, you can look at the mappings section of the template:

Mappings:
  SourceCode:
    General:
      S3Bucket: "solutions"
      KeyPrefix: "video-on-demand-on-aws/v5.0.0"

Screenshots image

Additional context if support investigation is correct and we are running out lambda function payload quota size, it means, we have too much logs for ingestion. Or it could changes to the lambda function in recent LZA releases. we did not experience this in version prior to 1.3. We also noticed that the log format in S3 changed to parquet and since then the amount of logs in the bucket decreased dramatically and the above error showed up.

erwaxler commented 1 year ago

Hi @RyanGhavidel , thank you for identifying this issue! Our team is currently researching this behavior and we expect to have a hotfix release soon. I will update this issue when I have more information.

Thank you for your interest in the Landing Zone Accelerator!

nagmesh commented 11 months ago

This is fixed as of latest release. I'll go ahead and close it. Thank you!