aws-samples / amazon-cloudfront-access-logs-queries

Analyze your Amazon CloudFront Access Logs at Scale with Amazon Athena.
MIT No Attribution
111 stars 70 forks source link

Analyzing your Amazon CloudFront access logs at scale

This is a sample implementation for the concepts described in the AWS blog post Analyze your Amazon CloudFront access logs at scale using AWS CloudFormation, Amazon Athena, AWS Glue, AWS Lambda, and Amazon Simple Storage Service (S3).

This application is available in the AWS Serverless Application Repository. You can deploy it to your account from there:

cloudformation-launch-button

Overview

The application has two main parts:

FAQs

Q: How can I get started?

Use the Launch Stack button above to start the deployment of the application to your account. The AWS Management Console will guide you through the process. You can override the following parameters during deployment:

The stack contains a single S3 bucket called <ResourcePrefix>-<AccountId>-cf-access-logs. After the deployment you can modify your existing Amazon CloudFront distribution configuration to deliver access logs to this bucket with the new/ log prefix.

As soon Amazon CloudFront delivers new access logs, files will be moved to GzKeyPrefix. After 1-2 hours, they will be transformed to files in ParquetKeyPrefix.

You can query your access logs at any time in the Amazon Athena Query editor using the AWS Glue view called combined in the database called <ResourcePrefix>_cf_access_logs_db:

SELECT * FROM cf_access_logs.combined limit 10;

Q: How can I customize and deploy the template?

  1. Fork this GitHub repository.

  2. Clone the forked GitHub repository to your local machine.

  3. Modify the templates.

  4. Install the AWS CLI & AWS Serverless Application Model (SAM) CLI.

  5. Validate your template:

    $ sam validate -t template.yaml
  6. Package the files for deployment with SAM (see SAM docs for details) to a bucket of your choice. The bucket's region must be in the region you want to deploy the sample application to:

    $ sam package
        --template-file template.yaml
        --output-template-file packaged.yaml
        --s3-bucket <BUCKET>
  7. Deploy the packaged application to your account:

    $ aws cloudformation deploy
        --template-file packaged.yaml
        --stack-name my-stack
        --capabilities CAPABILITY_IAM

Q: How can I use the sample application for multiple Amazon CloudFront distributions?

If your data does not need to be partitioned by Amazon CloudFront distribution, you can use the same bucket and path (new/) for more than one distribution. Then you can query the data by host column. If you need to speed up the parquet transformation duration (must stay under 15 minutes) or query duration, deploy another AWS CloudFormation stack from the same template for each distribution. The stack name is added to all resource names (e.g. AWS Lambda functions, S3 bucket etc.) so you can distinguish the different stacks in the AWS Management Console.

Q: In which region can I deploy the sample application?

The Launch Stack button above opens the AWS Serverless Application Repository in the US East 1 (Northern Virginia) region. You may switch to other regions from there before deployment.

Q: How can I add a new question to this list?

If you found yourself wishing this set of frequently asked questions had an answer for a particular problem, please submit a pull request. The chances are good that others will also benefit from having the answer listed here.

Q: How can I contribute?

See the Contributing Guidelines for details.

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.