aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.93k forks source link

aws-lambda: version resource name changing from v1 to v2 #20650

Open pcolmer opened 2 years ago

pcolmer commented 2 years ago

Describe the bug

Code that deploys a Lambda is generating a resource that references the current version. The name of that resource is changing when we migrate from CDK v1 to v2, resulting in CloudFormation wanting to destroy the existing resources but it can't because there are other stacks in the app that are dependent on those resources.

The only way I seem to be able to deploy the app using v2 CDK is to destroy it completely first which is undesirable.

Expected Behavior

I should be able to deploy/upgrade my app to v2 without problems and certainly without needing to destroy the entire deployment.

Current Behavior

Running diff results in the following:

Resources
[-] AWS::Lambda::Version cloudfrontaddsecurityheadersCurrentVersion16169602d5819224674252586b3d48dd18654174 destroy
[-] AWS::Lambda::Version addindextodirectoryurlsCurrentVersionD6A21E4798a50b14deec652bcdbf1dbe6d635381 destroy
[+] AWS::Lambda::Version cloudfront-add-security-headers/CurrentVersion cloudfrontaddsecurityheadersCurrentVersion1616960298ca6e043ddae0d5ff51b3ecf3edb258
[+] AWS::Lambda::Version add-index-to-directory-urls/CurrentVersion addindextodirectoryurlsCurrentVersionD6A21E472e930828858ae2ecd8cfb4658a1927a5

Reproduction Steps

from aws_cdk import core as cdk
from aws_cdk import aws_lambda as lambda_

CLOUDFRONT_ADD_SECURITY_HEADERS_NAME = "pjc-test-headers"

class HelloCdkStack(cdk.Stack):

    def __init__(self, scope: cdk.Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        lambda_security_header = lambda_.Function(
            self,
            CLOUDFRONT_ADD_SECURITY_HEADERS_NAME,
            runtime=lambda_.Runtime.NODEJS_12_X,
            code=lambda_.Code.from_asset("lambda/add-security-headers"),
            handler="add-security-headers.handler",
            function_name=CLOUDFRONT_ADD_SECURITY_HEADERS_NAME
        )
        self.add_security_headers_arn = lambda_security_header.current_version.edge_arn

This will produce output like this:

[-] AWS::Lambda::Version pjctestheadersCurrentVersion6A41A374a584fa336027d0271a4c8a7f1566b8c0 destroy
[+] AWS::Lambda::Version pjc-test-headers/CurrentVersion pjctestheadersCurrentVersion6A41A374f92b3f06e16da8e8e102c48b89c67cef

Note that setting @aws-cdk/core:stackRelativeExports to false does not fix the issue.

Possible Solution

None known.

Additional Information/Context

No response

CDK CLI Version

1.159.0 and 2.27.0

Framework Version

No response

Node.js Version

16.3.0

OS

Ubuntu 20.04.0

Language

Python

Language Version

No response

Other information

No response

peterwoodworth commented 2 years ago

I don't think we changed anything about our hashing or name generation between versions, so it's strange to me that you're running into this

We're in maintenance mode for v1, so since this isn't a major issue we won't be addressing it. You can work around this by overriding the logical ids of resources which are having their names changed.

pcolmer commented 2 years ago

@peterwoodworth

I don't think we changed anything about our hashing or name generation between versions, so it's strange to me that you're running into this

Which is why I've reported it!

The documentation at https://docs.aws.amazon.com/cdk/v2/guide/migrating-v2.html says:

If your upgraded app ends up undeployable after the two-stage upgrade, please report the issue.

I am unable to re-deploy my app after upgrading from CDK v1 to CDK v2, hence my reporting this issue. At the moment, without making any code modifications, my only recourse is to destroy the stack and then do a new deploy of the stack, which results in a complete loss of the infrastructure during that process.

I have, as required by the guidelines, provided a small (2 lines of code!) 100% reproducible example of the problem I'm having.

You can work around this by overriding the logical ids of resources which are having their names changed.

So are you proposing that I hack my code - for each deployment of the stack - to override the "wrong" logical ID with the logical ID that has already been deployed? Without knowing how many instances I've deployed ... ?

Or are you proposing that there is a way for me to (a) see if the stack has already been deployed, (b) retrieve the logical ID of the resource if it has been deployed, (c) continue with the rest of the code and then (d) override the "wrong" ID with the old one? If so, can you please provide more guidance/example code?

If I truly have to destroy existing deployments in order to upgrade to CDK v2 then perhaps AWS CDK is not the tool for me after all.

peterwoodworth commented 2 years ago

I have, as required by the guidelines, provided a small (2 lines of code!) 100% reproducible example of the problem I'm having.

The issue here is that I'm not able to reproduce the behavior you're seeing with the code you've provided.

So are you proposing that I hack my code - for each deployment of the stack - to override the "wrong" logical ID with the logical ID that has already been deployed? Without knowing how many instances I've deployed ... ?

I merely proposed a method which can work 🙂 I know it's far from ideal, but I figured it was worth mentioning since I knew it off the top of my head

Are you able to reproduce this issue following your steps in a completely fresh project?

pcolmer commented 2 years ago

Are you able to reproduce this issue following your steps in a completely fresh project?

Yes - I wrote the issue as I built the fresh project. I literally did what I wrote. I deployed the original code using CDK v1, followed the instructions to upgrade the code/stack to CDK v2 then attempted to deploy using v2.

I should point out that if you deploy JUST using CDK v2, you won't see the problem. This only seems to be apparent when upgrading an existing stack that was deployed with v1.

The issue here is that I'm not able to reproduce the behavior you're seeing with the code you've provided.

That is worrying. I will go through it all again to make sure I haven't missed anything out or mistyped anything at my end.

pcolmer commented 2 years ago

@peterwoodworth I've put together a git repository with two branches that definitely repros the issue: https://github.com/pcolmer/aws-cdk-20650

The main branch is the CDK v1 version of the app. After deploying that, switch to the CDK v2 branch and follow the steps in the README to upgrade the environment to v2 and then cdk diff.

If you still cannot reproduce it after that, then I don't know what to say. I'm open to running any extra commands or diagnostics if it helps unpack why the resource ID is getting changed.

peterwoodworth commented 2 years ago

I see why you're running into this @pcolmer -

This is where we calculate the logical ID - we actually override it ourselves - https://github.com/aws/aws-cdk/blob/ba91ca45ad759ab5db6da17a62333e2bc11e1075/packages/%40aws-cdk/aws-lambda/lib/function.ts#L421-L433

We override the last 32 characters based of the hash of the lambda function. The difference between your versions is only on the last 32 characters. I've also managed to reproduce this myself on my own TS library and also with your repo.

The difference between v1 and v2 I think is due to the fact that the hash is based on the lambda function and its properties. There is a difference in how the S3key prop is built between the versions - but that's handled deep within our asset code

https://github.com/aws/aws-cdk/blob/ba91ca45ad759ab5db6da17a62333e2bc11e1075/packages/%40aws-cdk/core/lib/stack-synthesizers/_asset-manifest-builder.ts#L18

Without looking into it more I'm not super sure why there's a difference between versions. But this is what I've found so far

pcolmer commented 2 years ago

@peterwoodworth Thank you for continuing to investigate and for sharing your findings so far. I am so relieved that you have been able to reproduce the problem now!

The difference between your versions is only on the last 32 characters.

Ah! Upon re-reading the output from cdk diff, I can see what you mean! I was seeing this:

[-] AWS::Lambda::Version pjctestheadersCurrentVersion6A41A3741e769839a4a4eca15363055ff023227a destroy
[+] AWS::Lambda::Version pjc-test-headers/CurrentVersion pjctestheadersCurrentVersion6A41A374f92b3f06e16da8e8e102c48b89c67cef

and thinking that CDK was destroying pjctestheadersCurrentVersion6A41A3741e769839a4a4eca15363055ff023227a and wanting to create a new resouce called pjc-test-headers/CurrentVersion.

I had completely misunderstood that output and now see that it is wanting to create pjctestheadersCurrentVersion6A41A374f92b3f06e16da8e8e102c48b89c67cef

Will you or someone on the CDK team be able to look into this further or is it something I've got to "live with" and just hack my code to get past the issue?

indrora commented 2 years ago

@peterwoodworth @kaizencc would a feature flag for those coming from old versions help?

or do we consider this a "This is a price you pay for updating"?

Rabadash8820 commented 2 years ago

@indrora This is definitely not a price that anyone is expecting to pay when they update, considering that the migration docs say:

Changes to logical IDs (causing replacement of resources) are not expected.

I've been running into this issue myself. I'm still confused why the logical IDs are changing at all if neither the function configuration nor the function code have changed. @peterwoodworth do you have any other insight into that?

Mickael-van-der-Beek commented 1 year ago

This is a serious issue for us as well and prevents us from upgrading to v2.

SecurityGroups get randomly removed and added back again with different names and various cross-stack references get renamed as well creating dead-locks prevents deployment of stacks.

I would consider not being able to upgrade to v2 in scope.

jonife commented 5 months ago

This is a known issue when migrating from CDK V1 to V2 and the workaround proposed here, seems to be the best options necessary to ensure that this issue is not a problem anymore.