aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.67k stars 3.92k forks source link

CloudWatch Logs Rotation to S3 #7715

Closed horsmand closed 4 years ago

horsmand commented 4 years ago

The feature I'm proposing is to create a resource to rotate logs out of CloudWatch Logs into S3 after a configured duration of time. The resource will allow for rotation from CWL to an S3 Bucket using Lambda's SingletonFunction.

I'm also proposing to add a method to LogGroup called addRotationPolicy for setting this rotation up, although the LogGroupRotator could also just work on its own.

This would add dependencies for aws-s3 and aws-lambda inside of aws-logs.

Use Case

My team has a lot of infrastructure that writes logs to CWL and we would like to reduce IMR while still holding onto logs for a few months by moving our logs into cheaper storage.

Proposed Solution

This would be how the usage of the method call would work. The LogGroup to be rotated would be created along with the Bucket to be used to put the logs into.

const logGroup = new LogGroup(this, 'LogGroup', { retention: RetentionDays.THREE_DAYS });
const logBucket = new Bucket(this, 'LogBucket', { lifeCycleRules: [...] });

logGroup.addRotationPolicy({
  bucket: logBucket,
  folderName: 'LogsFolderInS3',
  rotateAfter: RetentionDays.THREE_DAYS,
});

Adding the addRotationPolicy() inside LogGroup:

interface RotationPolicyProps {
  /**
   * The bucket to move the logs into
   */
  bucket: Bucket,
  /**
   * The folder to put the logs in, inside the S3 bucket
   * @defaults to no folder
   */
  folderName?: string,
  /**
   * The duration to wait before moving the logs
   * @defaults to the Log Group's retention
   */
  rotateAfter?: RetentionDays,
}
addRotationPolicy(props: RotationPolicyProps) {
  const logGroupRotator = new LogGroupRotator(props);
  logGroupRotator.register(logGroup, props.folderName);
}

LogGroupRotator setup:

  private exportLogsFunction: SingletonFunction;
  private scheduledLogRotation: Rule;
  private logStorageBucket: IBucket;
  private rotateAfter: RetentionDays;

  constructor(scope: Construct, id: string, props: S3RotationProps) {
    super(scope, id);

    const lambdaRole = new Role(this, 'LogRotatorRole', {
      assumedBy: new ServicePrincipal('lambda.amazonaws.com'),
      managedPolicies: [
        ManagedPolicy.fromAwsManagedPolicyName('AmazonS3FullAccess'),
        ManagedPolicy.fromAwsManagedPolicyName('CloudWatchLogsFullAccess'),
      ]
    });
    this.exportLogsFunction = new SingletonFunction(this, 'LogRotatorFunction', {
      code: Code.fromAsset(path.join(__dirname, '../scripts/rotate-logs.js')),
      handler: 'lambdaHandler',
      role: lambdaRole,
      runtime: Runtime.NODEJS_10_X,
      uuid: this.LOG_ROTATOR_UUID
    });

    this.scheduledLogRotation = new Rule(this, 'LogRotatorRule', {
      schedule: Schedule.cron({hour: '10'}),
    });
  }

  public register(logGroup: LogGroup, logFolderName: string): void {
    this.scheduledLogRotation.addTarget(new LamdaFunction(this.exportLogsFunction, {
      event: RuleTargetInput.fromObject({
        rotateAfter: this.rotateAfter,
        region: Stack.of(this).region,
        logFolderName,
        logGroupArn: logGroup.logGroupArn,
        s3BucketArn: this.logStorageBucket.bucketArn,
      })
    }));
  }

The rotate-logs.js would use the AWS SDK's CreateExportTask to set the task up for the rotation.


This is a :rocket: Feature Request

ddneilson commented 4 years ago
   /**
   * The duration to wait before moving the logs
   * @defaults to the Log Group's retention
   */
  rotateAfter?: RetentionDays,

Is there a risk of a race between the rotation & CloudWatch expiring the logs if the default is the same time period?

Also, from the CreateExportTask docs:

Each account can only have one active (RUNNING or PENDING) export task at a time.

What is the plan for navigating that constraint?

rix0rrr commented 4 years ago

Seems useful, but I don't think this belongs in the CDK core.

I would recommend you (or someone else) self-publishes this as a 3rd party construct.

rix0rrr commented 4 years ago

I'll leave the issue open to collect conversations.

horsmand commented 4 years ago

There is a race condition as the expiring logs don't all get deleted at the same time of day, it's based on the time they were created (i.e. 24 hours if retention is set to 1 day). This can be avoided by setting the rotation for the day before the expiration. The Lambda scheduling the export task could also confirm it completes and then delete the logs it exported to avoid having them stored in 2 places for a day.

For the constraint of only having one active export task at a time, I have done my own testing and it shows that it doesn't exist. I created 3 log groups that contained approximately 1 GB of logs each (10 streams with 100 MB each) and I was able to run 3 CreateExportTask calls asynchronously and then call DescribeExportTasks to see them all with the status code RUNNING. I think it's worth getting clarification from CloudWatch about that, so I'll follow up.

rix0rrr commented 4 years ago

Unfortunately our backlog is not intended to keep discussion issues that are not actionable for us. So I'm going to close this after all.