Athena results are much different than Cost Explorer ?

concurrencylabs / aws-cost-analysis

Tools that make it easier to analyze AWS Cost and Usage reports. Initial version provides support for Athena and QuickSight.

GNU General Public License v3.0

173 stars 43 forks source link

Athena results are much different than Cost Explorer ? #15

Open robpark opened 6 years ago

robpark commented 6 years ago

We have a very large CSV that's produced for the billing report from AWS, so it doesn't load in Excel for parsing there.

I did get the report_utils script to execute successfully with no errors and data does appear in Athena tables, but when I run:


FROM  hourly_20180901_20181001``` only returns a number that is ~ 20% of the actual expected amount shown by cost explorer.

Have you seen anything like this?
Would you have any suggestions as to where to look?

concurrencylabs commented 6 years ago

Thanks for using the code in this repo. There is a delay between AWS Cost and Usage reports and the data you see in Cost Explorer, so there's always a bit of a mismatch, but you shouldn't see ~20% of the amount in Cost Explorer.

I would start by looking at the Manifest JSON file that AWS places in the S3 bucket you have configured for your reports and see the S3 keys for the files AWS has generated. Then validate that the same files have been placed in the S3 destination you defined when running the script. If you see the same number of files and they have the same size, then it's most likely an issue with the report generated by AWS.

robpark commented 5 years ago

Sorry for the delay, but I'm actually not getting a manifest.json in my billing bucket?

Not sure if this is what you mean by S3 keys, but I get 4 files daily:

<a/c>-aws-billing-csv-YYYY-MM.csv
<a/c>-aws-billing-detailed-line-items-YYYY-MM.csv.zip
<a/c>-aws-billing-detailed-line-items-with-resources-and-tags-YYYY-MM.csv.zip
<a/c>-aws-cost-allocation-YYYY-MM.csv

The 2 line item files are zipped.

In the destination bucket, I only get 1 file per month, so not the same number of files. ? My guess was that you only care about the resources and tags file, but I haven't looked deeply enough to see if you somehow iterate all of them? and append?

My end of month file is > 2.5GB, so very hard to inspect manually.

concurrencylabs commented 5 years ago

It seems like you have configured Detailed Billing Reports instead of Cost and Usage Reports. The code in this repo works with Cost and Usage Reports, which can be configured in the AWS console: https://console.aws.amazon.com/billing/home?region=us-east-1#/reports/create

When configuring CURs, just check the "Include resource IDs" and "Automatically refresh your Cost & Usage Report when charges are detected for previous months with closed bills." boxes. Time granularity should be set to "Hourly" and "Include Resource IDs" enabled.

Then AWS will place reports under the S3 bucket and prefix you configure. It will create a folder structure by period (e.g. 20181101-20181201, etc.) and right under each period folder you should see the Manifest.json file.

I hope this helps.