aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.59k stars 3.89k forks source link

cdk version 2.33 onwards is getting stuck #22923

Closed galsasi1989 closed 1 year ago

galsasi1989 commented 1 year ago

Describe the bug

I am trying to deploy an S3 bucket using 2.32.1 and it's working just fine. My cdk is run from Jenkins and is written in Typescript(node v16) running inside a docker container

Jenkins is running cdk cli version 2.44.0. When I upgrade the package in the package.json to 2.33.0 onwards, the same deployment command is getting stuck and the pipeline is staying hang.

Am I missing something? Are there any breaking changes in 2.33.0? from the release notes I couldn't find any useful information.

Thanks, Gal

Expected Behavior

Using cdk packages(aws-cdk in DevDependencies and aws-cdk-lib in dependencies) will work so I will be able to deploy the S3 bucket with the latest versions.

Current Behavior

When I am using cdk packages in version 2.32.1 it works just fine. I am able to deploy the S3 bucket. After upgrading to version 2.33.0 or any later version, the cdk synth/diff/deploy is getting hang..

Reproduction Steps

The Jenkins pipeline is running inside docker containers. On the Jenkins agent, docker server is installed. The first container in the pipeline is based on python 3.8. Inside it, another docker container of nodejs v16(alpine dist) is running with cdk-cli version 2.44.0 installed.

This is the package.json:

{
    "name": "general",
    "version": "0.1.0",
    "bin": {
        "general": "bin/general.js"
    },
    "scripts": {
        "build": "tsc",
        "watch": "tsc -w",
        "test": "jest",
        "cdk": "cdk"
    },
    "devDependencies": {
        "@types/jest": "^27.5.2",
        "@types/node": "^10.17.27",
        "@types/prettier": "2.6.0",
        "aws-cdk": "2.32.1",
        "jest": "^27.5.1",
        "ts-jest": "^27.1.4",
        "ts-node": "^10.9.1",
        "typescript": "~3.9.7"
    },
    "dependencies": {
        "aws-cdk-lib": "2.32.1",
        "constructs": "^10.0.0",
        "@aws-cdk/aws-glue-alpha": "^2.32.1-alpha.0",
        "source-map-support": "^0.5.21"
    }
}
```{
    "name": "general",
    "version": "0.1.0",
    "bin": {
        "general": "bin/general.js"
    },
    "scripts": {
        "build": "tsc",
        "watch": "tsc -w",
        "test": "jest",
        "cdk": "cdk"
    },
    "devDependencies": {
        "[@types/jest](https://npmjs.com/package/@types/jest)": "[^27.5.2](https://npmjs.com/package/@types/jest)",
        "[@types/node](https://npmjs.com/package/@types/node)": "[^10.17.27](https://npmjs.com/package/@types/node)",
        "[@types/prettier](https://npmjs.com/package/@types/prettier)": "[2.6.0](https://npmjs.com/package/@types/prettier)",
        "[aws-cdk](https://npmjs.com/package/aws-cdk)": "[2.32.1](https://npmjs.com/package/aws-cdk)",
        "[jest](https://npmjs.com/package/jest)": "[^27.5.1](https://npmjs.com/package/jest)",
        "[ts-jest](https://npmjs.com/package/ts-jest)": "[^27.1.4](https://npmjs.com/package/ts-jest)",
        "[ts-node](https://npmjs.com/package/ts-node)": "[^10.9.1](https://npmjs.com/package/ts-node)",
        "[typescript](https://npmjs.com/package/typescript)": "[~3.9.7](https://npmjs.com/package/typescript)"
    },
    "dependencies": {
        "[aws-cdk-lib](https://npmjs.com/package/aws-cdk-lib)": "[2.32.1](https://npmjs.com/package/aws-cdk-lib)",
        "[constructs](https://npmjs.com/package/constructs)": "[^10.0.0](https://npmjs.com/package/constructs)",
        "[@aws-cdk/aws-glue-alpha](https://npmjs.com/package/@aws-cdk/aws-glue-alpha)": "[^2.32.1-alpha.0](https://npmjs.com/package/@aws-cdk/aws-glue-alpha)",
        "[source-map-support](https://npmjs.com/package/source-map-support)": "[^0.5.21](https://npmjs.com/package/source-map-support)"
    }
}

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.44.0

Framework Version

No response

Node.js Version

16

OS

Ubuntu 18/20

Language

Typescript

Language Version

No response

Other information

No response

peterwoodworth commented 1 year ago

Can you please put together a repo which we can clone to reproduce this issue? There's a lot of parts and pieces to this, and no changes were made between 2.32 and 2.33 which look suspicious.

galsasi1989 commented 1 year ago

Hi @peterwoodworth

I have just created a project with samples from my code: https://github.com/galsasi1989/cdk-sample-issue Inside this repository you can find a general directory with the cdk code and a Jenkinsfile

Let me know if you need additional information

Thanks!

peterwoodworth commented 1 year ago

Hey @galsasi1989,

it seems to me like you might've left some stuff out in your reproduction. This repo is only creating an S3 bucket, a related custom resource for the deletion of objects inside the bucket, and some roles. I'm not familiar with Jenkins or how you might've setup your pipeline, so I'll need a little bit more help here 🙂

peterwoodworth commented 1 year ago

I'm also curious to know the exact error you're running into. Can you paste the error message you're receiving?

galsasi1989 commented 1 year ago

Hi @peterwoodworth It's indeed the case. I am talking only about running cdk which will create/update S3 bucket. There's no error actually, the pipeline is just stuck until I stop it manually and I don't see any change in cloudformation.

Here is a step by step guide how to setup Jenkins: https://www.jenkins.io/doc/tutorials/tutorial-for-installing-jenkins-on-AWS/ My Jenkins in running on Ununtu 18/20 if you prefer to run it on the same OS.

At the end, you should have 2 machines - 1 server and 1 agent. You need to ssh into the agent and make sure the OS is updated and install docker and git(if any of them is missing)

After Jenkins is up and running, you need to click on Manage Jenkins -> Manage Plugins. Click on 'Available' tab and search for the following plugins:

  1. docker plugin
  2. docker-build-step
  3. timestamper
  4. AWS(for simplicity, you can install of them)
  5. git
  6. Cleanup workspace plugin

You will need to install the following plugins by clicking on 'Install without restart'

Then you'll need to build 2 docker images(I will add the Dockerfiles to the repository above):

  1. python based docker image - use the tag: 3.8.6-pipeline
  2. nodejs based docker image - use the tag: 16.17.1-alpine-pipeline

At the end, you'll have to create a pipeline job in Jenkins which will run the Jenkinsfile in the above repository. Inside the Jenkinsfile you can find the cdk commands that I have tried to run without success.

FarrOut commented 1 year ago

hey @peterwoodworth , if it helps, feel free to take a look at this project. it's still pretty modest, but hopefully it helps with getting the Jenkins master up.

For the initial setup admin password, you can find it in the CloudWatch LogGroup.

anuprajg commented 1 year ago

Interesting issue which helped solved my problem by reverting to v2.32.1 as well. I see the same issue happening from v2.33 including the latest 2.51.1

However, in my case issue happens if I use the Triggers to invoke a Lambda function during CDK deployment. Maybe that helps in identifying the root cause.

Here's my reduced Stack: https://gist.github.com/anuprajg/3925fa431891108c204de72aebc3a39d which is basically a Hello World Lambda function + Trigger to execute it during deployment.

On commenting Line 27 which is creating the trigger, the cdk bootstrap/deploy works fine. With the trigger, the pipeline hangs at line 26.

Also note that, deploying the same stack with cdk 2.51.1 via local machine (OSX), it does go through. So the issue has to do something with the Jenkins environment + cdk changes between v.2.32.1 and v2.33 related to Trigger (maybe)

anuprajg commented 1 year ago

v.2.33 has some fixes related related to Custom Resource Provider https://github.com/aws/aws-cdk/issues/17460. Could that cause issues while in Jenkins environment?

@galsasi1989 In your project, if you remove the autoDeleteObjects, does it work with newer cdk versions?

autoDeleteObjects: removalPolicy === RemovalPolicy.DESTROY ? true : false,

galsasi1989 commented 1 year ago

Hi @anuprajg

Thanks for you reply! when I set the autoDeleteObjects to false(hard-coded), it works even with cdk version 2.51.1 And you're right, behind the scene, cloudformation invokes a lambda function so what you're saying makes a lot of sense.

Regarding running cdk locally, again, you're right. It works locally(OSX and WSL ubuntu 20.04). Only in Jenkins it's getting stuck. My Jenkins server and agents are running on-premises. Do you have any clue what might be the reason it's getting stuck in Jenkins? The version of Jenkins server is 2.361.2

Thanks!

rix0rrr commented 1 year ago

@galsasi1989,

You didn't post representative output, so it's hard to say what's wrong. You are saying that cdk synth, cdk diff, and cdk deploy all fail, correct? That means it must not have started a CloudFormation deployment yet, correct?

rix0rrr commented 1 year ago

Aha it might be this: https://github.com/aws/aws-cdk/issues/21379

rix0rrr commented 1 year ago
github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

galsasi1989 commented 1 year ago

Hi @rix0rrr

Thanks for your help with this issue! It was very helpful after we spent long days or even weeks on this issue.

Can you please give us a high level description about the communication between cdk and the linux kernel? what was changed in cdk and how is it related to the kernel version?

In addition, I think it's very important to add validation and make sure that all the system requirements are met when I install my cdk project's dependencies(via pip, npm or other tools) and throw a clear exception as much as possible so at least we will have a clue next time.

rix0rrr commented 1 year ago

The CDK behavior is as follows:

The change was:

The problem was:

Full props to @nburtsev for figuring this out. I'm not sure I myself would have been able to put all of this together.


In summary:

The CDK does not directly communicate with the kernel--we just perform filesystem copies. Bugs in the interaction of other pieces of software cause the file copy to loop endlessly if the right combination of circumstances is hit.