concurrencylabs / aws-cost-analysis

Tools that make it easier to analyze AWS Cost and Usage reports. Initial version provides support for Athena and QuickSight.
GNU General Public License v3.0
173 stars 43 forks source link

Serverless Step Function fails #6

Closed Nr18 closed 6 years ago

Nr18 commented 6 years ago

Hi,

I'm trying to setup this project using the CloudFormation template (PR is coming if i have it working) and in the StepFunction the following error is raised:

'VarCharValue': KeyError
Traceback (most recent call last):
File "/var/task/functions/init-athena-queries.py", line 37, in handler
result_dict['getTotalCost']['resultset'] = apiprocessor.getTotalCost()
File "/var/task/awscostusageprocessor/api.py", line 50, in getTotalCost
return self.getResultSet(consts.ACTION_GET_TOTAL_COST)
File "/var/task/awscostusageprocessor/api.py", line 42, in getResultSet
response['results'] = self.athena.get_query_execution_results(queryexecutionid)
File "/var/task/awscostusageprocessor/sql/athena.py", line 148, in get_query_execution_results
row_dict[rowheaders[columnindex]['VarCharValue']] = columnvalue['VarCharValue']
KeyError: 'VarCharValue'

So i added a debug statement to print the queryresults in the get_query_execution_results function: log.info("Error: {}".format(json.dumps(queryresults)))

That results in:

{
    "ResultSet": {
        "Rows": [
            {
                "Data": [
                    {
                        "VarCharValue": "sum_unblendedcost"
                    }
                ]
            },
            {
                "Data": [
                    {}
                ]
            }
        ],
        "ResultSetMetadata": {
            "ColumnInfo": [
                {
                    "Scale": 0,
                    "Name": "sum_unblendedcost",
                    "Nullable": "UNKNOWN",
                    "TableName": "",
                    "Precision": 17,
                    "Label": "sum_unblendedcost",
                    "CaseSensitive": false,
                    "SchemaName": "",
                    "Type": "double",
                    "CatalogName": "hive"
                }
            ]
        }
    },
    "ResponseMetadata": {
        "RetryAttempts": 0,
        "HTTPStatusCode": 200,
        "RequestId": "9bdc5156-****-****-****-************",
        "HTTPHeaders": {
            "date": "Fri, 19 Jan 2018 09:24:54 GMT",
            "x-amzn-requestid": "9bdc5156-****-****-****-************",
            "content-length": "619",
            "content-type": "application/x-amz-json-1.1",
            "connection": "keep-alive"
        }
    }
}

Due to the empty Data: {} the script will fail can i ignore this? or is this caused by a misconfiguration? Thanks!

concurrencylabs commented 6 years ago

Hi, thanks for reporting this issue and sorry to hear it caused some trouble. Just writing to acknowledge it and to let you know that we'll take a look at it later today.

Nr18 commented 6 years ago

Thanks!

So played around with it a bit and got it working, but i did need to change a few things:

In the processor.py i changed:

monthDestPrefix = self.destPrefix + period_prefix

to:

monthDestPrefix = '{}{}/{}'.format(self.destPrefix, self.accountId, period_prefix)

Because the Athena table is expecting the accountId to be in the path and i removed the placeholder string in the destination path to correct it. I need to re-test it and then i will commit my changes to my fork and submit a pull request. The issue above was caused due to the fact that athena was querying a empty path.

concurrencylabs commented 6 years ago

It sounds like accountId was somehow not set at the beginning of the Step Function execution.

The Step Function takes as an input a dictionary that includes, among other things, accountId. This dictionary gets passed from one step to the next. For some reason when it was time to execute function init-athena-queries, accountId was missing and the execution failed.

One recommended way to start the step function is by using the starter function s3event-step-function-starter.py, which gets triggered by an S3 event whenever a new Cost and Usage report is placed in the source S3 bucket. This function sets accountId, which should make it eventually to the init-athena-queries table.

If you're starting the Step Function differently, just make sure that it has a dictionary that includes year, month, sourceBucket, sourcePrefix, destBucket, destPrefix and accountId. And optionally, xAccountSource and roleArn if you're accessing Cost and Usage reports cross-account.

I will test a code update that double checks for accountId at the end of the process-cur function and sets it, in case it wasn't set by the step function starter.

Nr18 commented 6 years ago

@concurrencylabs i am using the s3event-step-function-starter.py function via an event trigger on the bucket that receives the reporting from AWS.

I pushed my code and created PR #7 so that you can see what i did, i'm running the whole thing as I described in the README.md

The accountid is in the event it was not used in the path to upload the processed csv files as far as i could see, but again you can see that in the PR.

If i need to change something in the PR i'm happy to do that, i only added the stuff that was missing for me to get started with this project so that i could start playing around with queries in Athena.

concurrencylabs commented 6 years ago

Thanks for the PR, will take a look and leave it running in a test environment.

concurrencylabs commented 6 years ago

Fixed in PR https://github.com/concurrencylabs/aws-cost-analysis/pull/7