aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.38k stars 3.78k forks source link

(core): proliferation of cdk.out$hash directories in $TMPDIR #27356

Open diranged opened 9 months ago

diranged commented 9 months ago

Describe the bug

On my laptop, I've noticed recently that my $TMPDIR is filling up... this feels like a new behavior, as we've been developing with CDK for almost a year now, and only recently did this start happening to me. It's tough to pinpoint when, but I think it has to do with the cdk.out directory being renamed to cdk.out$hash at some point. In the last 2 days, I've accumulated over 170GB of temp data:

$ sudo du -sch $TMPDIR   
170G    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T/
170G    total

When I dig into it, it's all CDK data:

$ sudo du -sch $TMPDIR/cdk*
 16K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-custom-resource00f1xy
 16K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-custom-resource00gzE3
 12K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-custom-resource03avRC
...
 12K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-custom-resourcezzLzLS
 20K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-test-app-05EjOV
 96K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-test-app-0CZJXf
...
 28K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-test-app-DOfnAs
 41M    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-test-app-DTQJUe
 41M    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk-test-app-DTr6AZ
...
212K    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk.outWD3zzZ
 41M    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk.outWEJF5m
 41M    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk.outWEZIMn
 41M    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk.outWEpfOE
...
  0B    /var/folders/dm/b5by_qw91nd0ctdjbvggzgr40000gq/T//cdk8s.outdir.zxI76J
169G    total

I have over 9000+ individual test directories:

$ ls -la $TMPDIR | grep cdk | wc
    9617   86553  715166

This feels similar to https://github.com/aws/aws-cdk/issues/2869 - but not exactly the same..

Expected Behavior

I expect that the TMPDIR data would be cleaned up after each run... but I think that this was never needed back when the output dir was $TMPDIR/cdk.out .. but now it's $TMPDIR/cdk.out$rand and that is causing this buildup of junk.

Current Behavior

Build up of left over junk tmp data dirs..

Reproduction Steps

Just run your tests over and over again

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.93.0

Framework Version

No response

Node.js Version

18

OS

OSX

Language

Typescript

Language Version

No response

Other information

No response

indrora commented 9 months ago

I suspect that it's expected that $TMPDIR doesn't survive reboots (that is, it's a ramdisk rather than real disk space).

diranged commented 9 months ago

@indrora On a CI/CD system, that makes sense... but on a local development environment I think it's more of a problem. It also likely slows down the build process quite a bit as well - rebuilding assets that don't need to be rebuilt. Thoughts?

misirlou-tg commented 8 months ago

I have seen files & directories accumulate in my temp folder as well when running cdk synth or cdk deploy. The first time I noticed it the there were 100s of folders with over 30GB.

I narrowed it down to two file name patterns and one directory name pattern, I look for all of them and remove the directories where they live.

The following shows the temp dirs/files left behind in a "clean" temp dir after running cdk synth once on a small project

C:\Users\build\AppData\Local\Temp>dir Amazon.CDK.Asset.AwsCliV1.aws-cdk-asset-awscli* /s/b
C:\Users\build\AppData\Local\Temp\d2f5yn3d.hy0\Amazon.CDK.Asset.AwsCliV1.aws-cdk-asset-awscli-v1-2.2.177.tgz

C:\Users\build\AppData\Local\Temp>dir jsii-runtime.js /s/b
C:\Users\build\AppData\Local\Temp\iqkq15ry.yu4\bin\jsii-runtime.js

C:\Users\build\AppData\Local\Temp>dir cdk-custom-resource* /s/b
C:\Users\build\AppData\Local\Temp\cdk-custom-resource5ypanG

C:\Users\build\AppData\Local\Temp>rd /s/q d2f5yn3d.hy0

C:\Users\build\AppData\Local\Temp>rd /s/q iqkq15ry.yu4

C:\Users\build\AppData\Local\Temp>rd /s/q cdk-custom-resource5ypanG
whereisaaron commented 1 month ago

If you work with AWS CDK locally you soon have 1000's of abandoned folders in /tmp consuming 100GB+ of diskspace!! Surely it is the application's responsibility to clean up its temporary files? If clean up is not ready available, in the meantime could the temp folder have a predictable prefix or extension, so that manual clean up is easier. Right now it is generating random 8.3 folder names like this is an MS DOS application 😅

e.g. yt5tu2fe.qm3 --> awscdk-yt5tu2fe.qm3 or yt5tu2fe.qm3 --> yt5tu2feqm3.awscdk

...
2.4M    yt5tu2fe.qm3
2.4M    yts00joi.3if
58M     ytwfqwhv.rrg
67M     yvrstndb.o3p
2.4M    yybbwobp.f51
2.4M    yyhruseu.r45
62M     yznbuveb.bil
2.4M    yzqrjcxr.zrf
58M     z0amgf5y.ind
2.4M    z0hclc20.gsb
2.4M    z0lm0avk.dfc
2.4M    z1i2ayve.uhb
2.4M    z2yd5sca.i2a
58M     z3fpl4e1.4jb
2.4M    z4pdtzwy.cvg
58M     z4svshro.lxn
62M     z52k5uta.t02
2.4M    z5vxbrxb.crv
2.4M    zbasg4kv.bi1
2.4M    zbckroax.vt2
2.4M    zbdlchch.foh
58M     zdmkgg3o.jns
67M     zehsulxf.u5a
2.4M    zenzce51.jh5
2.4M    zgmmavsg.fil
62M     zgszwowm.wtt
2.4M    zgvdgafl.2is
2.4M    zgweaxic.pcf
58M     zhwt1tx3.oks
2.4M    zjkgwme5.3q2
67M     zjmsrsuv.ibq
58M     zjvjduaa.3fa
2.4M    zk055vcr.fdw
2.4M    zk4wx0g2.q1i
58M     zkbviiad.52w
2.4M    zkim13u0.rxs
2.4M    zl111zyt.hxq
2.4M    zl1r3svb.syb
2.4M    zlqm3vyz.xwy
58M     zltd2fn0.zrb
2.4M    zmnxnaje.3ge
2.4M    znmfotlv.kky
58M     znpogu23.201
2.4M    zobow0ac.esw
62M     zodnn2ux.mpv
2.4M    zr2f2mq3.5n5
58M     zraj3qur.3xi
...
indrora commented 1 month ago

n.b. I no longer work for Amazon, the following is purely my own opinion as a member of the community.

The CDK is generally designed to be used on Linux; the vast majority of Linux distributions mount /tmp as a tmpfs file system in memory; currently, Debian is the odd one out (i believe with Ubuntu moving to tmpfs as well about a decade ago) as it generally targets “low resource” systems by default.

macOS is an inscrutable black box in this regard in that it relies on launchd and cron to clean up unloved files that have not been touched or opened in >3 days depending on the version of macOS that you are running.

That said: it is generally considered (In My Experience) good practice as you have noted to include some prefix or to consume some sub-path of /tmp to make manual cleansing and hygiene easier on an individual.

I believe a possible workaround (though I have not tested this) is to use the TMPDIR environment variable. I will defer to current maintainers on that specific issue however.