aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.93k forks source link

(custom-resources): exceptions are not surfaced in cloudformation, re-opened #31536

Open ben-lee-zocdoc opened 1 month ago

ben-lee-zocdoc commented 1 month ago

Describe the bug

Referencing https://github.com/aws/aws-cdk/issues/31472 I made a mistake and I actually am using the provider framework lambda, so the previous issue was improperly closed.

The doc says to Specifically, to report success or failure, have your Lambda Function exit in the right way: return data for success, or throw an exception for failure.

When our lambda throws an exception, the details are not surfaced in the Cloudformation dashboard. Instead, it says a generic Received response status [FAILED] from custom resource. Message returned: Error: Uncaught lambda exception,....

Regression Issue

Last Known Working CDK Version

No response

Expected Behavior

I expect the Reason to be populated with the Exception message, something like "Received response status FAILED from custom resource. Message returned: ." where the reason is lambda exception.

Current Behavior

We are seeing a generic error

Received response status [FAILED] from custom resource. Message returned: Error: Uncaught lambda exception, execution stopped Logs: /aws/lambda/my-lambda-function at invokeUserFunction (/var/task/framework.js:2:6) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1676) (RequestId: abcdef-ghij-1234-5678-333f1c96d6d3)

Reproduction Steps

CDK code:

Creating the user lambda stack

const lambdaStackFunction = ...CreateMyCustomDotnetLambda(...);

const provider = new custom_resources.Provider(this, 'MyProvider', {
    logRetention: aws_logs.RetentionDays.ONE_WEEK,
    onEventHandler: lambdaStackFunction,
  });

const frameworkFunc = provider.node.tryFindChild('framework-onEvent') as aws_lambda.Function;

this.exportValue(frameworkFunc.functionArn, {
  name: 'FrameworkFunctionArn'
});
export class MyResource extends Construct {
    constructor(scope: Construct, id: string) {
        super(scope, id);

        const crProps: CustomResourceProps = {
            resourceType: 'Custom::resource',
            serviceToken: Fn.importValue(
                'FrameworkFunctionArn'
            )
        };
        const resource = new CustomResource(this, 'custom', crProps);
    }
}

Our dotnet Lambda code:

public async Task<CustomResourceResponse<ResponseContract>> OnEvent(
    CustomResourceRequest<CustomResourceProperties> request,
    ILambdaContext context
)
{
    throw new Exception("I want this reason to show up");
}

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.145.0

Framework Version

No response

Node.js Version

20

OS

Mac

Language

TypeScript

Language Version

4.8.3

Other information

The lambda is in net8.0

I can confirm that there are two lambdas created. Here are the logs from the framework lambda:

2024-09-23T20:58:54.582Z    a8012ec6-aaaa-bbbb-cccc-c0dda733096a    INFO    [provider-framework] user function threw an error: Unhandled

2024-09-23T20:58:54.641Z    a8012ec6-aaaa-bbbb-cccc-c0dda733096a    INFO    [provider-framework] submit response to cloudformation https://cloudformation-custom-resource-response-useast1.s3.amazonaws.com//arn%3Aaws%3Acloudformation...... {
    "Status": "FAILED",
    "Reason": "Error: Uncaught lambda exception, execution stopped\n\nLogs: /aws/lambda/my-user-lambda\n\n    at invokeUserFunction (/var/task/framework.js:2:6)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async onEvent (/var/task/framework.js:1:369)\n    at async Runtime.handler (/var/task/cfn-response.js:1:1676)",
    "StackId": "arn:aws:cloudformation:....",
    "RequestId": ".....",
    "PhysicalResourceId": "......",
    "LogicalResourceId": "....."
}

This same node error is what gets surfaced in my cloudformation console, even though my user lambda is in dotnet. In the previous issue https://github.com/aws/aws-cdk/issues/31472 the comment said

onEvent should handle exception when possible, however, if some unexpected exception is thrown, the provider framework should be able to capture that and gracefully callback cloudformation as resource fails to be created.

I would expect the framework function to capture the thrown exception from the user dotnet lambda gracefully, and return the exception to the cloudformation console.

Let me know if I should provide more details.

pahud commented 1 month ago

I expect the Reason to be populated with the Exception message, something like "Received response status FAILED from custom resource. Message returned: ." where the reason is lambda exception.

I don't think CloudFormation console would expose the exception trace log and this could be a security concern.

From CFN's perspective, the lambda function is just a "custom resource provider" which is responsible to handle resource create/update/delete events. CFN only cares about if the provider return expected result. If not, CFN would just tell you there's an exception happening in the provider as it does not receive what it expects. And it's user's responsibility to check that from the Lambda log. The Provider would never throw that log back to CFN so CFN would never know that message.

ben-lee-zocdoc commented 1 month ago

Based on AWS CFN docs, https://repost.aws/knowledge-center/cfn-troubleshoot-custom-resource-failures it seems to suggest that the Reason field of the response will show up in the AWS console. And this is also what we see in the exception I posted above. The reason field is populated with "Error: Uncaught lambda exception, execution stopped...

If the provider framework lambda is able to surface the user lambda exception in the proper format, cloudformation should be able to receive it.

ben-lee-zocdoc commented 1 month ago

In our user lambda, neither throwing nor catching and returning something like

{
    Status = "FAILED",
    Reason = "some error that should show up in CFN console"
}

will work. The AWS-CDK docs suggest that we should be throwing an exception to indicate failure, but doesn't seem to provide a way to surface the failure reason properly?