Closed galsasi1989 closed 1 year ago
Can you please put together a repo which we can clone to reproduce this issue? There's a lot of parts and pieces to this, and no changes were made between 2.32 and 2.33 which look suspicious.
Hi @peterwoodworth
I have just created a project with samples from my code: https://github.com/galsasi1989/cdk-sample-issue Inside this repository you can find a general directory with the cdk code and a Jenkinsfile
Let me know if you need additional information
Thanks!
Hey @galsasi1989,
it seems to me like you might've left some stuff out in your reproduction. This repo is only creating an S3 bucket, a related custom resource for the deletion of objects inside the bucket, and some roles. I'm not familiar with Jenkins or how you might've setup your pipeline, so I'll need a little bit more help here 🙂
I'm also curious to know the exact error you're running into. Can you paste the error message you're receiving?
Hi @peterwoodworth It's indeed the case. I am talking only about running cdk which will create/update S3 bucket. There's no error actually, the pipeline is just stuck until I stop it manually and I don't see any change in cloudformation.
Here is a step by step guide how to setup Jenkins: https://www.jenkins.io/doc/tutorials/tutorial-for-installing-jenkins-on-AWS/ My Jenkins in running on Ununtu 18/20 if you prefer to run it on the same OS.
At the end, you should have 2 machines - 1 server and 1 agent. You need to ssh into the agent and make sure the OS is updated and install docker and git(if any of them is missing)
After Jenkins is up and running, you need to click on Manage Jenkins -> Manage Plugins. Click on 'Available' tab and search for the following plugins:
You will need to install the following plugins by clicking on 'Install without restart'
Then you'll need to build 2 docker images(I will add the Dockerfiles to the repository above):
At the end, you'll have to create a pipeline job in Jenkins which will run the Jenkinsfile in the above repository. Inside the Jenkinsfile you can find the cdk commands that I have tried to run without success.
hey @peterwoodworth , if it helps, feel free to take a look at this project. it's still pretty modest, but hopefully it helps with getting the Jenkins master up.
For the initial setup admin password, you can find it in the CloudWatch LogGroup.
Interesting issue which helped solved my problem by reverting to v2.32.1 as well. I see the same issue happening from v2.33 including the latest 2.51.1
However, in my case issue happens if I use the Triggers to invoke a Lambda function during CDK deployment. Maybe that helps in identifying the root cause.
Here's my reduced Stack: https://gist.github.com/anuprajg/3925fa431891108c204de72aebc3a39d which is basically a Hello World Lambda function + Trigger to execute it during deployment.
On commenting Line 27 which is creating the trigger, the cdk bootstrap/deploy works fine. With the trigger, the pipeline hangs at line 26.
Also note that, deploying the same stack with cdk 2.51.1 via local machine (OSX), it does go through. So the issue has to do something with the Jenkins environment + cdk changes between v.2.32.1 and v2.33 related to Trigger (maybe)
v.2.33 has some fixes related related to Custom Resource Provider https://github.com/aws/aws-cdk/issues/17460. Could that cause issues while in Jenkins environment?
@galsasi1989 In your project, if you remove the autoDeleteObjects, does it work with newer cdk versions?
autoDeleteObjects: removalPolicy === RemovalPolicy.DESTROY ? true : false,
Hi @anuprajg
Thanks for you reply! when I set the autoDeleteObjects to false(hard-coded), it works even with cdk version 2.51.1 And you're right, behind the scene, cloudformation invokes a lambda function so what you're saying makes a lot of sense.
Regarding running cdk locally, again, you're right. It works locally(OSX and WSL ubuntu 20.04). Only in Jenkins it's getting stuck. My Jenkins server and agents are running on-premises. Do you have any clue what might be the reason it's getting stuck in Jenkins? The version of Jenkins server is 2.361.2
Thanks!
@galsasi1989,
You didn't post representative output, so it's hard to say what's wrong. You are saying that cdk synth
, cdk diff
, and cdk deploy
all fail, correct? That means it must not have started a CloudFormation deployment yet, correct?
cdk deploy -v
and paste the output?Aha it might be this: https://github.com/aws/aws-cdk/issues/21379
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Hi @rix0rrr
Thanks for your help with this issue! It was very helpful after we spent long days or even weeks on this issue.
Can you please give us a high level description about the communication between cdk and the linux kernel? what was changed in cdk and how is it related to the kernel version?
In addition, I think it's very important to add validation and make sure that all the system requirements are met when I install my cdk project's dependencies(via pip, npm or other tools) and throw a clear exception as much as possible so at least we will have a clue next time.
The CDK behavior is as follows:
autoDeleteObjects
creates a Custom Resource that will clear the bucket on stack deletion.cdk.out
directory as part of asset staging. This is the same for all assets. The directory these files are copied into depends on the hash of all source files going into it, so the source bundle needs to be complete before this step can start.The change was:
node_modules
directory. This was actually incorrect, as the node_modules
directory should be considered a read-only repository of library code. So we changed the code generation to be moved to the system's temporary directory.overlayfs
file system.$TMP
dir back to a location inside a Docker volume mount)The problem was:
0
bytes.copyFile
function keeps on retrying the call to copy more and more bytes over, getting 0
every time, and waiting until the copy is complete. This never finishes, and so the build appears to hang.0
, allowing the copy to succeed.Full props to @nburtsev for figuring this out. I'm not sure I myself would have been able to put all of this together.
In summary:
The CDK does not directly communicate with the kernel--we just perform filesystem copies. Bugs in the interaction of other pieces of software cause the file copy to loop endlessly if the right combination of circumstances is hit.
Describe the bug
I am trying to deploy an S3 bucket using 2.32.1 and it's working just fine. My cdk is run from Jenkins and is written in Typescript(node v16) running inside a docker container
Jenkins is running cdk cli version 2.44.0. When I upgrade the package in the package.json to 2.33.0 onwards, the same deployment command is getting stuck and the pipeline is staying hang.
Am I missing something? Are there any breaking changes in 2.33.0? from the release notes I couldn't find any useful information.
Thanks, Gal
Expected Behavior
Using cdk packages(aws-cdk in DevDependencies and aws-cdk-lib in dependencies) will work so I will be able to deploy the S3 bucket with the latest versions.
Current Behavior
When I am using cdk packages in version 2.32.1 it works just fine. I am able to deploy the S3 bucket. After upgrading to version 2.33.0 or any later version, the cdk synth/diff/deploy is getting hang..
Reproduction Steps
The Jenkins pipeline is running inside docker containers. On the Jenkins agent, docker server is installed. The first container in the pipeline is based on python 3.8. Inside it, another docker container of nodejs v16(alpine dist) is running with cdk-cli version 2.44.0 installed.
This is the package.json:
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.44.0
Framework Version
No response
Node.js Version
16
OS
Ubuntu 18/20
Language
Typescript
Language Version
No response
Other information
No response