Open nathanpeck opened 5 years ago
Duplicated by #3749
Seems to be related to #1332
Hi @nathanpeck, thanks for submitting a feature request! This seems like a reasonable and helpful ask. We will look into this and someone will update this issue when there is movement.
I think that if users do cdk deploy
we should actually emit cdk.out
directory under /tmp
instead of the project directory. When users deploy, cdk.out is just an intermediate artifact instead of a build artifact.
P.S. it should be something like /tmp/cdk.out.xxxx
where xxxx
is the hash of the project path (in order to allow multiple projects to co-exist on the same machine).
@eladb I do worry that would reduce the visibility of the folder. Particularly in cases where I have multiple projects and for some reason my stacks aren't generating as expected I would hate to have to figure out which of the outputs inside my tmp folder is the right one.
I think while it is tempting to piggyback on the existing tmp
cleanup behavior I don't think that it would be good for users of CDK, because it would end up being a hidden cache behavior that would be harder to clear when needed
If you do cdk synth
output will still go to ./cdk.out
which will give you visibility into exactly what's going to be used during deployment.
I am not sure I understand why you think putting intermediate (temporary) build artifacts is not a good use case for /tmp
. Isn't that what /tmp
is all about?
@eladb I don't think of the build artifacts as temporary.
For example if I GCC compile I would expect my C++ files to turn into object files in a local path, not in the /tmp
folder.
Or if I TypeScript compile I expect the resulting JavaScript to end up in the local directory, not in /tmp
From that perspective I see CDK to CloudFormation / assets as just another type of transformation, where I expect the resulting product to be local, not remotely cached
I'm not strictly opinionated on this, but it just feels somewhat strange to me if the cdk.out is located in a different folder outside of my project
I found this issue from a different direction - I have some tests for my CDK code, and each time I run them it is building a new asset directory and putting it in /tmp
, a new one for each test case. The assets for me happened to my 100s of MB, and soon my /tmp
device was full.
I think I would expect that - by default - assets for test runs were deleted after the test run had completed, regardless of where they are stored.
In the interim.. is it ok to just manually clear out anything in this folder (or even the whole folder)? I've left it building up for now as I wasn't sure if they were required somewhere down the line/for cdk diff
support/etc.
No, there should be no danger in removing cdk.out
locally - it will be re-created next time CDK is executed.
Is is possible to change where these cdk.outxxxxx
folders are created when running unit tests?
Our current plan is to have a process to clean up the /tmp
folder after the tests are run but the problem is that this is on our build agent and it doesn't have a huge '/tmp` directory and potentially multiple builds running at once
Is is possible to change where these
cdk.outxxxxx
folders are created when running unit tests?
You should be able to specify the output directory when you create an App
:
const app = new App({ outdir: '/tmp/foo' });
const stack = new MyTestStack(app, 'test');
// ...
I tried that setting for the app but it only seems to work for a synth
command. When I run the CDK unit tests, there are multiple cdk.out
directories created in the /tmp
folder - I would like to change this directory if possible
Shorter term solution with bash find . -name 'asset.*.zip' -print0 | xargs -0 rm
I run this at the end of deployments
I think that if users do
cdk deploy
we should actually emitcdk.out
directory under/tmp
instead of the project directory. When users deploy, cdk.out is just an intermediate artifact instead of a build artifact.P.S. it should be something like
/tmp/cdk.out.xxxx
wherexxxx
is the hash of the project path (in order to allow multiple projects to co-exist on the same machine).
I agree that the cdk.out
should be moved to the tmp, I'd vote that the folder path be more verbose though: .e.g /tmp/aws-cdk/{projectHash}/cdk.out/
- we use the /tmp
directory for a variety of things and selfishly I don't want to dozens of items in the root of /tmp
for CDK alone.
In order to provide easy access to the cdk.out
directory, you could either:
cdk context get aws-cdk:outDir
that would print the current project outdir so people could cd $(cdk context get aws-cdk:outDir)
-- maybe not the most convenient, but I'd use it....meaning, I'd create an alias for it π I have another use case for more control over the asset directories.
I'm using CDK with SAM CLI and I'm trying to use tsc-watch
to re-run the cdk synth
after detecting changes to typescript. Due to a new asset directory being created each time SAM needs to be restarted.
The workaround I'm about to implement is to get the existing asset directory name, delete it, then rename the new asset directory to the old one after cdk synth
. There's the possibility that SAM will keep a pointer to the original directory which is moved to trash but we shall see!
Please don't move cdk.out to /tmp as people who never reboot will have that thing blowing up as well. Also it is not safe when deploying multiple projects since the erase solution above would remove anything inside /tmp
I had literally over 100 asset.XXXXX
directories each weighing 85MB and since those have tons of small files it took a few minutes to delete the 9GB of data.
Why isn't all that being just deleted right after deploy (or before deploy so we keep last one)? If I would like to keep the data, I could explicitly ask for it.
I think cdk synth should clean the folder and create it again
How about automatic cleanup based on the creation date?
For example, configure cdk.json like:
{
"app": "bin/synth",
"autoCleanOutdatedAssetsBefore": "3days" // The assets created before 3 days are automatically deleted(on running `cdk synth` or etc.)
}
This is problematic with CDK tests as every test run creates a new directory in /tmp and when writing tests it fills up the hard disk space quite quickly
I've run out of space (aka memory on Linux) in /tmp
many times because the /tmp/cdk.out*
dirs.
Never had a problem around cdk.out
in the project root, but I haven't been doing much cdk synth
locally (we use pipelines).
+1 to finding a solution for this. I just had to clean up ~70GB of files from my cdk.out directory in my project.
Why not just delete the cdk.out
folder before each synth ou deploy?
Why not just delete the
cdk.out
folder before each synth ou deploy?
Because all Assets would have to be re-staged on every synth
that way (the ZIP files re-zipped, etc.), making it even slower than it is now.
I've run out of space (aka memory on Linux) in
/tmp
many times because the/tmp/cdk.out*
dirs.
I'm surprised by this. Is there no OS level garbage collection for /tmp
in your distribution?
I'm surprised by this. Is there now OS level garbage collection for /tmp in your distribution?
/tmp is a ramdisk (at least on my linux systems), so is gone after a restart/logout. But if you restart only once in a blue moon, running out of space will happen...
I'm surprised by this. Is there now OS level garbage collection for /tmp in your distribution?
/tmp is a ramdisk (at least on my linux systems), so is gone after a restart/logout. But if you restart only once in a blue moon, running out of space will happen...
Thanks for clarifying this. ππ»
I'm using C# and the DockerImageFunction construct and I just stumbled across 45GB of assets in cdk.out
My Program.cs now has the following
public static void Main(string[] args)
{
if (Directory.Exists(@"cdk.out"))
{
Console.Error.WriteLine(@"Erasing cdk.out/");
Directory.Delete(@"cdk.out", true);
Console.Error.WriteLine(@"Erased cdk.out/");
Console.Error.WriteLine(@"Creating cdk.out/");
Directory.CreateDirectory(@"cdk.out");
Console.Error.WriteLine(@"Created cdk.out/");
}
var app = new App();
...
Should be warned that if you delete your cdk.out
folder every time then it will make CDK much slower because CDK will not be able to reuse previously prepared assets, and will have to prepare them from scratch each time. Ideally you have some process to only clean up asset files that are older than a specific cutoff date or once the size gets over a threshold. That way your day to day usage of CDK will stay faster and you'll stop accumulating GB of data
I'm not sure of the issues hierarchy here, but everyone should probably be aware of a parallel discussion going on in https://github.com/aws/aws-cdk-rfcs/issues/64 (opened in 2018).
I feel like clearing out cdk.out better be an okay thing to do, because I build from multiple development locations, so they aren't going to be in sync depending on if I'm working from home or my office..
Deleting things out of the staging bucket is a little scarier to me. Issues related to scaling and rollback have been raised, but I am not enough of an expert to know whether or not those are legitimate concerns.
I think it should be okay to clear out the staging bucket after you successfully deploy, but I'm not confident enough to try it on a production project. The biggest item in the staging bucket looks like it might be part of the cdk itself (maybe put there by cdk bootstrap
?)
I think all this means two things:
I work on my project in a Dropbox folder, and regularly use xattr -w com.dropbox.ignored 1 node_modules
to prevent that directory being synced to Dropbox. I do the same with cdk.out, so any process that deletes the folder also removes the extended attribute and can lead to the files syncing to dropbox without me realising (until I run out of dropbox space).
The ability to move the artefacts to a directory outside the current working directory/tree (and outside of dropbox) is ideal, and I can always create a soft link for convenience from the cwd which isnβt synced.
Perpetual growth of the cdk.out directory is, IMHO, just lazy design. I appreciate that there are intermediate assets that might add extra cost to repeated synth/deploy cycles and these should be documented.
I'll add one more suggestion to the pile...
I'd like the CDK Toolkit to provide a clean
command that would serve as a standardized way to clean up the local resources that are created by running other toolkit commands such as synth
.
With a clean
command in place, developers can add a process to an appropriate phase of their build life cycle, based on their specific project needs. For example, with a JVM project using Apache Maven, the exec-maven-plugin could be used to execute the command (I do something similar today with a shell script).
Of course, the templates provided for use with the init
command could also provide a sensible default.
My CDK project is an npm package, and I utilize npm pre scripts to remove the cdk.out
directory before executing the cdk
command.
```json
{
"scripts": {
"cdk": "cdk",
"precdk": "shx rm -rf cdk.out"
}
}
```
> _I use [shx](https://github.com/shelljs/shx) to make it work on cross-platform._
package.json
Then run npm scripts as follows:
$ npm run cdk -- diff
$ npm run cdk -- deploy
If the environment in which the cdk
command is executed is limited, the easiest solution may be to define a shell alias for cdk
.
I hope this is of some help.
I have another use case for more control over the asset directories.
I'm using CDK with SAM CLI and I'm trying to use
tsc-watch
to re-run thecdk synth
after detecting changes to typescript. Due to a new asset directory being created each time SAM needs to be restarted.The workaround I'm about to implement is to get the existing asset directory name, delete it, then rename the new asset directory to the old one after
cdk synth
. There's the possibility that SAM will keep a pointer to the original directory which is moved to trash but we shall see!
@lprhodes I figured out a solution for this cdk.out/asset.*
hash folder. Since aws-cdk > NodejsFunctionProps.bundling. commandHooks, you can create a utility sh script to run it without re-running aws cdk every time... as it is time consuming...
sample code:
afterBundling(inputDir: string, outputDir: string): string[] {
const outFile = join(outputDir, "index.js");
const scriptPath = join(inputDir, "..", ".scripts");
const shFile = fileName.replace(".ts", ".sh");
return [
`mkdir -p ${scriptPath}`,
`echo esbuild ${inputDir}/${fileName} --outfile=${outFile} --watch --bundle --target=node18 --platform=node > ${scriptPath}/${shFile}`,
];
},
And then in my package.json
> scripts, I have "watch:lambda": "sh .scripts/<file_name>.sh"
When you run that script, esbuild is actually running watch and recompiles your changes and put it out to the cdk.out/asset.*
folder path (thanks to commandHooks outputDir)...
Hope that helps! I was able to code my lambdas in typescript and re-run the lambda without costing so much time for the cdk re-runs.
Resources:
Each time I run a CDK deploy I get a new asset in the asset's directory, and they seem to accumulate forever. Each asset folder is around 100 MB for me, so this quickly adds up to many GB of data. Here is a screenshot of it accumulating assets again after the last time I cleaned it out manually.
Ideally I would like a CDK configuration that would cause CDK to automatically garbage collect older asset files it no longer needs so I don't have to do it manually.