aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.41k stars 3.8k forks source link

sns: Lambda Event Source for Opt In Regions Fail #30280

Open ncino-esselman opened 2 months ago

ncino-esselman commented 2 months ago

Describe the bug

[Updated as the behavior has changed since opening the issue]

Leveraging addEventSource(new SnsEventSource(testTopic) on a Lambda in a non opt in account when the topic is in an opt-in region fails to create due to a 403 expired token access creates and deploys as expected but is unable to be triggered by messages published to the topic.

Expected Behavior

I expect the addEvent source to add the cross region topic from the opt-in account just as it would for an opt in account.

I expect the consuming lambda to be triggered by messages published to the opt in region's topic.

Current Behavior

The cdk deploy fails to create the eventSource with a 403 expired token error

The Function is deployed without error and the console reflects the subscription and trigger but the function fails to trigger when a message is posted to the topic.

Reproduction Steps [UPDATED]

[The lambda and the topic are in different cdk apps. Or the topic can be created in any mechanism.]

  1. Create a Lambda in us-east-1 (any not opt in region)
  2. Create a topic in me-central-1 (any opt in region)
  3. Through the cdk add lambdaFunction.addEventSource(new SnsEventSource(testTopic));
  4. cdk deploy
  5. Send a message to the topic
  6. Observe the lambda did not run

Possible Solution

Update https://github.com/aws/aws-cdk/blob/6fdc4582f659549021a64a4d676fce12fc241715/packages/aws-cdk-lib/aws-sns-subscriptions/lib/lambda.ts#L33 to check if the account is opt-in and if so add the region to the principal

principal: new iam.ServicePrincipal('sns.${optInTopicRegion}.amazonaws.com'),

Additional Information/Context

This stemmed from a support case from aws and they are looking into it but not sure when they would get to it.

CDK CLI Version

2.141.0

Framework Version

No response

Node.js Version

18.20.2

OS

MacOS

Language

TypeScript

Language Version

No response

Other information

No response

pahud commented 2 months ago

This deploys for me.


// deploy this stack in an opt-in region
export class TopicStack extends Stack  {
  readonly topic: sns.ITopic;
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    this.topic = new sns.Topic(this, 'Topic');
  }
}

export interface LambdaStackProps extends StackProps {
  readonly topic: sns.ITopic;
}

// deploy this stack in a not opt-in region
export class LambdaStack extends Stack {
  constructor(scope: Construct, id: string, props: LambdaStackProps) {
    super(scope, id, props);

    const func = getLambdaFunction(this);
    func.addEventSource(new lambdaEventSource.SnsEventSource(props.topic))
  }
}

app.ts

// deploy sns topic in an opt-in region
const topicStack = new TopicStack(app, 'TopicStack', { 
    env: OptInenv,
});
new LambdaStack(app, 'LambdaStack', { 
    env, 
    topic: topicStack.topic,
    crossRegionReferences: true,
});

Let me know if it works for you.

ncino-esselman commented 2 months ago

Your first code snippet doesn't compile as you are missing the definition of getLambdaFunction

I would love to be able to test exactly what you did as any code path through the event source does not include the correct principal permissions on the lambda resource policy.

Also due to the nature of our use case we are not able to directly pass the values between stacks as they are referenced from separate cdk applications.

sidharth15 commented 2 months ago

@ncino-esselman

Also due to the nature of our use case we are not able to directly pass the values between stacks as they are referenced from separate cdk applications.

So how are you passing in the topic to your function.addEventSource()?

@pahud From what I understand, it will deploy correctly but your topic won't be able to invoke the function because the correct service principal hasn't been allow-listed on the function's policy.

sidharth15 commented 2 months ago

@pahud Reproduced what I mentioned above in a sample CDK app with an SNS topic in eu-south-2 (opt-in) and the Lambda function in eu-west-1 (default enabled) - I'm able to deploy the cross-region resources, but the topic fails to invoke the lambda due to incorrect service principal (sns.amazonaws.com) in the function's policy.

Later, I manually modified the function policy to have the expected service principal (sns.eu-south-2.amazonaws.com) and the topic is able to invoke the function.

pahud commented 1 month ago

@ncino-esselman

const func = getLambdaFunction(this);

This is my private function that returns a dummy lambda.Function.

@sidharth15

If this issue still exists, can you provide a minimal CDK code sample that reproduces this issue so I can copy/paste in my IDE and verify?

ncino-esselman commented 1 month ago

Still an issue. We resolved it within our app by adding a check for the opt in region and specifically adding the principal before adding the event source

// Conditionally add a principal for opt in regions
lambdaFunction.addPermission(`SNSPrincipal-${optInRegion}`, {
          action: 'lambda:InvokeFunction',
          principal: new ServicePrincipal(`sns.${optInRegion}.amazonaws.com`)
        });
// A topic that exists outside of the cdk application
 const topic = Topic.fromTopicArn(
        this,
        `${topicName}`,
        `${topicARN}`
      );
lambdaFunction.addEventSource(new SnsEventSource(topic));

This is within the context of a stack. The main piece seems to be that topic has to exist outside the context of the cdk app. In ur case it is contained inside of another cdk app and we know how to construct the arn to reference it here. The code above should work. If you leave out the addPermission piece that is when you should see a 403 while deploying. I can try to put together a vanilla cdk snippet but it is difficult as our accounts restrict a number of things so we use an extended cdk.

We aren;t blocked and there is a work around it just seems that this logic should exist within the cdk instead of the consumer having to have special logic for the optin regions

sidharth15 commented 1 month ago

@ncino-esselman It feels strange that it's failing during deployment, I would've expected it to deploy successfully but the topic fail to invoke the Lambda. Can you post the full error message you're getting with the 403?

@pahud Here's the sample code to reproduce the issue:

app.ts

#!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from 'aws-cdk-lib';
import { LambdaStack } from '../lib/lambda-stack';
import { SnsTopicStack } from '../lib/sns-stack';

const app = new cdk.App();
const helloTopicStack = new SnsTopicStack(app, 'HelloSnsStack', {
  env: {
    account: 'xxxxxxxxxxx',
    region: 'eu-south-2'
  }
});

new LambdaStack(app, 'HelloLambdaStack', {
  topic: helloTopicStack.topic
});

lambda-stack.ts

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import { ITopic } from 'aws-cdk-lib/aws-sns';
import { SnsEventSource } from 'aws-cdk-lib/aws-lambda-event-sources';

export interface LambdaStackProps extends cdk.StackProps {
  topic: ITopic;
}

export class LambdaStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: LambdaStackProps) {
    super(scope, id, props);

    const helloFunction = new lambda.Function(this, 'HelloHandler', {
      runtime: lambda.Runtime.NODEJS_16_X,
      code: lambda.Code.fromAsset('lambda'),
      handler: 'hello.handler'
    });

    helloFunction.addEventSource(new SnsEventSource(props!.topic));
  }
}

sns-stack.ts

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as sns from 'aws-cdk-lib/aws-sns';

export class SnsTopicStack extends cdk.Stack {
    topic: sns.ITopic;

    constructor(scope: Construct, id: string, props?: cdk.StackProps) {
        super(scope, id, props);
        this.topic = new sns.Topic(this, 'HelloTopic', {
            topicName: 'MyFirstTopic'
        });
    }
}
ncino-esselman commented 1 month ago

Hmm it is already behaving different than it was. Looks like the deployment is no working fine. I am going to continue to do some testing if it has anything to do with existing infrastructure that is allowing it to function. Messages don't seem to be triggering the lambda though. Not sure where to see if there is a permission error on the lambda trigger.

Unrelated to the cdk there are 403 errors when messing with the optin region triggers in the console.

sidharth15 commented 1 month ago

@ncino-esselman Ya i never got a deployment error, but the SNS topic failed to trigger the Lambda. I don’t think you can see any permission errors as such (would check the SNS and Lambda metrics to be sure but unlikely), which is what makes it dangerous - it fails silently.

ncino-esselman commented 1 month ago

The deployment error does seem to have gone away. I have historical screen shots in an aws case that confirm the initial behavior. Can confirm the lambda will not be triggered correctly though even with a successful deployment.

I was able to confirm the proposed fix that initially resolved our deployment issue does also resolve the message delivery issue. By specifically calling out the optin region sns as a principal on the lambda permissions the flow works as expected.

Should I update the original issue now that it is behaving different? Seems like this issue is almost worse do to the silent nature of the "failures"

sidharth15 commented 1 month ago

@ncino-esselman Ya, I suggest to update the issue that deployment succeeds but Lambda fails invocation. I have the change for CDK lib in my local repo, wanted to confirm the behaviour with you before I raised a PR.

@pahud have you been able to reproduce the issue based off my above code snippets?

ncino-esselman commented 1 month ago

I updated the write up! Aww was hoping to be able to contribute! Thank you!

ncino-esselman commented 1 month ago

@sidharth15 any update on this?

sidharth15 commented 1 month ago

@ncino-esselman hey was swarmed with work a bit; I'll get the PR up soon.

@pahud did you get a chance to verify this to be a legitimate concern?