aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.84k forks source link

aws-cdk-lib/aws-lambda-nodejs: cache esbuild results #26020

Open bestickley opened 1 year ago

bestickley commented 1 year ago

Describe the feature

The NodejsFunction construct should intelligently cache build results of esbuild and reuse them on subsequent deploys.

new NodejsLambda(this, "CachedNodejsLambda", { bundling: { cache: true } });

Use Case

When working on a CDK app with many lambdas, deployments can take longer than I'd like. I want this to be faster so that the CDK provides a better DX and faster deployments. Work smarter, not harder, right? ;)

Proposed Solution

  1. Use es-module-lexer to find all files imported by entry point file.
  2. Compute hash of those files.
  3. Use hash for bundling.assetHash of construct
  4. Enjoy faster synths/deployments!

Technical considerations:

Other Information

If this is out of scope of the AWS CDK (which I hope it is not), @NimmLorr has documented a solution using turbopack. See this comment.

Acknowledgements

CDK version used

N/A

Environment details (OS name and version, etc.)

N/A

pahud commented 1 year ago

We had a similar discussion in 2020 https://github.com/aws/aws-cdk/issues/10286 and the conclusion was not to include additional npm modules and use docker instead. But welcome discussion if it's still relevant.

bestickley commented 1 year ago

@pahud, while esbuild isn't bundled into AWS CDK module, it is still used by AWS CDK NodejsFunction construct. Can something similar be done with es-module-lexer?. In order to mark cache: true, you'd need to install a peer dependency.

es-module-lexer seems pretty bare bones based from my minimal research. I'm curious, does anyone else know of a module that can give you a list of all dependencies imported by a module? I couldn't find one.

tmokmss commented 1 year ago

What if we just allow to set assetHashType as SOURCE? It would skip bundling when source files are not changed.

https://github.com/aws/aws-cdk/blob/0a61edf3499aa1a72709131d67ac849107870065/packages/aws-cdk-lib/aws-lambda-nodejs/lib/bundling.ts#L63

In PythonFunction, hash type is SOURCE by default, but idk why it isn't in NodejsFunction.

bestickley commented 1 year ago

@tmokmss, does assetHashType impact whether or not the function is bundled? I thought it was only for uploading the asset to S3? I'm asking for more of a "bundleHash"

tmokmss commented 1 year ago

@bestickley Yes if the asset hash is the same (if there is already a bundled result directory for the hash), bundling is skipped.

https://github.com/aws/aws-cdk/blob/0a61edf3499aa1a72709131d67ac849107870065/packages/aws-cdk-lib/core/lib/asset-staging.ts#L429

bestickley commented 1 year ago

I'm still looking for time to dedicate to this, but wanted to document. I found this library which could do all heavy lifting of finding dependency tree: https://github.com/dependents/node-dependency-tree

EDIT: or this one too: https://www.npmjs.com/package/@vercel/nft

fab-mindflow commented 11 months ago

I believe this should be prioritized with #24456 to address very slow builds in large CDK projects with lambdas.

piotrmoszkowicz commented 11 months ago

I would love to have that feature within CDK, it takes ages to build our Lambda dependant stacks!

mhyland-phoenicia commented 11 months ago

This solution doesn't necessarily deal with caching but, In the meantime a potential workaround that I ended up implementing was to create a prebuild step to bundle all the lambdas in parallel and then use Lambda.Code.fromAsset. We were able to shorten lambda bundling from 50+ seconds to<1 sec. https://gist.github.com/mhyland-phoenicia/c16ed0907c264fc767215e6cb214e5ef

whitakersp-fineos commented 9 months ago

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

LeoLapworthKT commented 9 months ago

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

whitakersp-fineos commented 9 months ago

Would love to have this feature. Our builds are super slow because of lambda building after adding powertools and prisma to our lambdas.

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

That's not using NodeJsFunction which provides a lot more capability than just function

LeoLapworthKT commented 9 months ago

Wondering if https://github.com/CloudSnorkel/cdk-turbo-layers helps?

That's not using NodeJsFunction which provides a lot more capability than just function

We use tubro-layers to bundle 3rd party dependencies, not the lambda function it self, into a layer (which is built in Cloudformation and only builds if changes in dependencies), we ALSO extract which packages are in that and supply to NodeJsFunction:

      layers: [ thePackagerLayer ],
      bundling: {
        externalModules,
      },

We are still using NodeJsFunction, but don't have the deploy overhead (if our dependencies haven't changed) of building 3rd party packages again.

So doesn't solve caching building of your node functions, but does remove the building of the dependencies.

We also added a BUILD_STACK env, which if set only builds our core stacks + those which match this value, to minimise what CDK looks at, this has helped with deploying in development, though we added it before the watch and hotdeploy existed we still use it even with those

If anyone is interested I can see about putting the code we wrap Turbo-Layers with somewhere public

ShivamJoker commented 7 months ago

Bundling is too slow right now. We need to get this merged.

AllanOricil commented 5 months ago

Is this going to be improved this year?

AllanOricil commented 5 months ago

After adding assetHash: cdk.AssetHashType.SOURCE property to the bundling object, in NodeJsFunction, my lambda functions are no longer rebuilt unless there is a change in the code.

So, can't this issue be closed? If not, can you explain why?

AllanOricil commented 5 months ago

After changing my lambda code, and rebuilding its cdk stack, no new asset was bundled. Is there an open issue for it?

AllanOricil commented 5 months ago

I made a mistake. assetHashType does not exist in NodeJsFunction

github-actions[bot] commented 5 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

AllanOricil commented 5 months ago

I was able to speed up my builds using assetHash and a s3 bucket to store cdk.out. Follow the steps I did.

  1. Compute hash using esbuild for all your lambda entrypoints and store it in a file. I stored it as a json object, where the key is the lambda entrypoint path and the value is the computed hash.

  2. In your cdk project, set assetHash of your lambda to the hash found in the file you created in step 1. Use the entrypoint path to get the hash.

  3. after the first bundling, store cdk.out somewhere you can retrieve in your CI automation. I stored it in a s3 bucket, in aws. On every new build, update your cached cdk.out if it is dirty (has diffs). Clear the cache folder every now and then, or when you want to have a full build.

Just by caching cdk.out I was able to cut the time of my builds by half.

ShivamJoker commented 5 months ago

Can you share any code sample for this? @AllanOricil

AllanOricil commented 5 months ago

@ShivamJoker this is the script I use to compute hashes to my lambdas.

/* eslint-disable */
const fs = require("fs");
const crypto = require("crypto");
const esbuild = require("esbuild");
const path = require("path");
const pkg = JSON.parse(fs.readFileSync("package.json", "utf-8"));

// Computes the hash
async function computeHash(entryPoint) {
  const result = await esbuild.build({
    platform: "node",
    entryPoints: [entryPoint],
    write: false,
    bundle: true,
    treeShaking: true,
    minify: true,
    external: Object.keys(pkg.dependencies),
  });
  const hash = crypto.createHash("sha256");
  hash.update(Buffer.from(result.outputFiles[0].contents));
  return hash.digest("hex");
}

(async () => {
  const entryPoints = []; // Lambda entrypoint paths
  let hashes = {};
  for (let ep of entryPoints) {
    hashes[ep] = await computeHash(ep);
  }

  fs.writeFileSync(
    "./resources/lambda/computed-hashes.json",
    JSON.stringify(hashes, null, 2),
  );
})();

This is how I use the hashes located at ./resources/lambda/computed-hashes.json in the CDK when creating my lambdas:

const computedLambdaHashes = JSON.parse(fs.readSync(path.resolve("../resources/lambda/computed-hashes.json")));
const entryPath = path.resolve("../resources/lambda/lambda-path/index.ts");
const assetHash = computedLambdaHashes[entryPath]; //exchange path by the hash

new NodejsFunction(this, "function", {
  entry: entryPoint,
  handler: "main",
  runtime: lambda.Runtime.NODEJS_18_X,
  bundling: {
    assetHash // use the hash to bundle it
  }
});

During CI automation, cdk.out is downloaded from an S3 bucket before building or synth. And uploaded to S3 after synth.

- |
  if aws s3 ls s3://my-cache-bucket/cdk.out.tar.gz; then
    aws s3 cp s3://my-cache-bucket/cdk.out.tar.gz cdk.out.tar.gz
    tar -zxf cdk.out.tar.gz
  else
    echo "File not found."
  fi
- npm run build
- npm run cdk:synth:all
- tar -zcf cdk.out.tar.gz ./cdk.out
- aws s3 cp cdk.out.tar.gz s3://my-cache-bucket/cdk.out.tar.gz