concurrencylabs / aws-cost-analysis

Tools that make it easier to analyze AWS Cost and Usage reports. Initial version provides support for Athena and QuickSight.
GNU General Public License v3.0
173 stars 43 forks source link

SAM and DynamoDB #9

Closed Nr18 closed 6 years ago

Nr18 commented 6 years ago

In the xacct-step-function-starter.py file you scan with the following filter attributes:

You only write the lastProcessedTimestamp value in update-metadata.py and you read it in athena.py.

As far as i can see the xacct-step-function-starter.py and thus the Lambda function xAcctStepFunctionStarter which is scheduled to run every 5 minshave never effect due to the missing dataCollectionStatus attribute.

What is the best approach to run the detection of new reports via S3 events or scheduled events?

concurrencylabs commented 6 years ago

The best approach to detect new reports is via S3 if you're working only on your own AWS account. This way, when AWS places a new Cost and Usage report in S3, the S3EventStepFunctionStarter function starts the processing of the new CUR and creates a new Athena table.

We use the xAcctStepFunctionStarter function to access Cost and Usage reports in different AWS accounts (i.e. dev & test accounts, or clients), that's why it checks every 5 minutes for new reports. lastProcessedTimestamp is used, among other things, for an API-like feature in the AthenaQueryMgr class in athena.py. The idea is that when this API is called, we don't want to query Athena every single time. Instead, the code stores query results in S3, which get updated when a new CUR is processed. Adding examples of how to call this API is WIP, though. But we have xAcctStepFunctionStarter running for a lot of AWS accounts and it does work well.

Nr18 commented 6 years ago

Ah cool i changed my pr to disable the scheduled event via a parameter and added the s3 trigger event.

But it seems that sam is ignoring the condition.

concurrencylabs commented 6 years ago

One thing with S3 events and SAM is that you can only enable S3 events for buckets that are created in the same SAM template. That might be the reason.

Nr18 commented 6 years ago

It's a know limitation of SAM, see: https://github.com/awslabs/serverless-application-model/issues/142

concurrencylabs commented 6 years ago

I tried the following, removing the S3 event from the Lambda function and adding it to the S3 bucket:

  CostUsageReport:
    Condition: CreateCurS3BucketEnabled
    DependsOn: [ S3EventStepFunctionStarter ]
    Type: AWS::S3::Bucket
    Properties: 
      AccessControl: Private
      BucketName: !Ref BucketName
      LambdaConfigurations:
        Event: s3:ObjectCreated:*
        Filter:
          S3Key:
            Rules:
              - Name: prefix
                Value: !Sub '${ReportPathPrefix}/'
              - Name: suffix
                Value: Manifest.json
        Function: !GetAtt S3EventStepFunctionStarter.Arn

I tried a new condition CreateCurS3BucketEnabled because the S3 bucket creation was interfering with existing stacks that already have an S3 bucket and a history of Cost and Usage reports in them.

concurrencylabs commented 6 years ago

I think we can close this issue. S3 event related issues are covered by https://github.com/concurrencylabs/aws-cost-analysis/pull/7