aws-cdk-lib/aws-lambda-nodejs: Respect the --asset-parallelism flag when bundling via local esbuild install

Cosmic-Ryver commented 1 year ago

Describe the feature

I propose that the bundling process for NodejsFunctions be refactored such that the aws cdk toolkit can orchestrate bundling in parallel with synth, and that the toolkit performs this orchestration when the --asset-parallelism option is truthy.

Use Case

Currently, NodejsFunctions inefficiently bundle synchronously during synth regardless of the --asset-parallelism option for deployments. For heavily serverless projects with dozens of Lambdas or more, this massively slows synth and deployment time. The proposed feature will reduce deployment times and, alongside options like hotswap and watch, increase developer's iteration velocity in heavily serverless NodeJs projects.

Proposed Solution

The following is intended for illustration. It is neither elegant nor, frankly, a "good" solution to this problem.

The Bundling class of the module can expose the interface to allow the toolkit to command parallel bundling:

// @aws-cdk/aws-lambda-nodejs/lib/bundling.ts@Bundling
private static parallelBundling: boolean = false;

public static set parallelBundling(parallelBundling: boolean) {
  Bundling.parallelBundling = parallelBundling;
}

private static staged: { [outputDir: string]: { tempFile: string, inputDir: string, tryBundling: () => boolean } } = {};

public static bundleInParallel = async (): Promise<void> => {
  let bundlings: Array<Promise<void>> = []
  for (const [outputDir, { tryBundling, inputDir, tempFile }] of Object.entries(Bundling.staged)) {
    bundlings.push((async () => {
      // Need to swap the directories back for staging
      fs.renameSync(outputDir, inputDir);
      // Need to remove the temporary file that prevents AssetStaging from throwing
      fs.rmSync(tempFile);
      const success = tryBundling();
      if (!success) {
        const assetDir = path.basename(outputDir);
        throw new Error(`Failed to bundle NodejsFunction for ${assetDir}.`);
      }
      // Okay to return the directories to their final locations
      fs.renameSync(inputDir, outputDir);
    })());
  }
  const results = await Promise.allSettled(bundlings)
  const errors: PromiseRejectedResult[] = []
  for (const result of results) {
    if (result.status === "rejected") {
      errors.push(result);
    }
  }
  if (errors.length) {
    errors.map(({ reason }) => console.error(reason))
    throw new Error("One or more assets failed to bundle")
  }
}

CDK Toolkit must set Bundling.parallelBundling = true to force this mode of operation. It can then call bundleInParallel to create a promise that will resolve once all bundling operations return successfully. For example:

// aws-cdk/lib/cdk-toolkit.ts@CdkToolkit.deploy
public async deploy(options: DeployOptions) {
  if (options.watch) {
    return this.watch(options);
  }

  lambdaNodeJs.parallelBundling = options.assetParallelism;

  if (options.notificationArns) {
    options.notificationArns.map( arn => {
      if (!validateSnsTopicArn(arn)) {
        throw new Error(`Notification arn ${arn} is not a valid arn for an SNS topic`);
      }
    });
  }

  const startSynthTime = new Date().getTime();
  const [stackCollection] = await Promise.all([
    this.selectStacksForDeploy(options.selector, options.exclusively, options.cacheCloudAssembly),
    options.assetParallelism ? lambdaNodeJs.Bundling.bundleInParallel() : Promise.resolve(),
  ]);
  const elapsedSynthTime = new Date().getTime() - startSynthTime;
  print('\n✨  Synthesis time: %ss\n', formatTime(elapsedSynthTime));
  // ...
}

You may have noticed that bundleInParallel relies on a hash map of cached bundling functions stored in the static prop staged. This would be populated by the @aws-cdk/AssetStaging class's invocation of the tryBundling function returned by Bundling.getLocalBundlingProvider. When invoked by AssetStaging, tryBundling will capture the arguments passed into it within a new closure and cache the resulting function to staged. Example:

// @aws-cdk/aws-lambda-nodejs/lib/bundling.ts@Bundling
private getLocalBundlingProvider(): cdk.ILocalBundling {
  const osPlatform = os.platform();
  const createLocalCommand = (outputDir: string, esbuild: PackageInstallation, tsc?: PackageInstallation) => this.createBundlingCommand({
    inputDir: this.projectRoot,
    outputDir,
    esbuildRunner: esbuild.isLocal ? this.packageManager.runBinCommand('esbuild') : 'esbuild',
    tscRunner: tsc && (tsc.isLocal ? this.packageManager.runBinCommand('tsc') : 'tsc'),
    osPlatform,
  });
  const environment = this.props.environment ?? {};
  const cwd = this.projectRoot;

  // NEW: we've pulled `tryBundle` instantiation out of the return and added an options argument
  const tryBundle = (outputDir: string, options?: { staging: boolean }): boolean => {
    // NEW: when invoked by AssetStaging, ie without `options.staging === false`, cache input to a new closure
    const parallelize = Bundling.parallelBundling && options?.staging ?? true
    if (parallelize) {
      const tempFile = path.join(outputDir, '.staged');
      // Required, as AssetStaging asserts `outputDir` is not empty after bundling
      fs.appendFileSync(tempFile, 'staged for parallel build')

      // Caching done here
      Bundling.staged[outputDir] = {
        tryBundling: () => tryBundle(outputDir, { staging: false }),
        inputDir: this.projectRoot
        tempFile,
      };

      // Allow synth to continue uninterrupted
      return true;
    }
    // Bundle as normal if either `Bundling.parallelBundling === false` or `options.staging === false`
    if (!Bundling.esbuildInstallation) {
      process.stderr.write('esbuild cannot run locally. Switching to Docker bundling.\n');
      return false;
    }

    if (!Bundling.esbuildInstallation.version.startsWith(`${ESBUILD_MAJOR_VERSION}.`)) {
      throw new Error(`Expected esbuild version ${ESBUILD_MAJOR_VERSION}.x but got ${Bundling.esbuildInstallation.version}`);
    }

    const localCommand = createLocalCommand(outputDir, Bundling.esbuildInstallation, Bundling.tscInstallation);

    exec(
      osPlatform === 'win32' ? 'cmd' : 'bash',
      [
        osPlatform === 'win32' ? '/c' : '-c',
        localCommand,
      ],
      {
        env: { ...process.env, ...environment },
        stdio: [ // show output
          'ignore', // ignore stdio
          process.stderr, // redirect stdout to stderr
          'inherit', // inherit stderr
        ],
        cwd,
        windowsVerbatimArguments: osPlatform === 'win32',
      },
    );

    return true;
  };

  return {
    tryBundle
  };
}

Other Information

I may be able to implement this feature and open a PR, but would like feedback on alternatives to the above approach. Specifically is there a preferred way to integrate optional behavior like this with the toolkit? I assume such a direct coupling of a top level CLI command with such a specific construct is a non-starter. However, I'd be happy to be wrong about that as well.

Acknowledgements

[X] I may be able to implement this feature request
[ ] This feature might incur a breaking change

CDK version used

2.67.0

Environment details (OS name and version, etc.)

N/A

fab-mindflow commented 1 year ago

We believe this is a super important issue.

We recently did a refactoring and we now have more than 60-70 lambdas dispatched in multiple nested stacks gathered in one of our stack. Using cdk watch --hotswap on this parent stack now takes 1-2 minutes at every lambda update.

That kills the developer experience. We have searched around asset bundling and esbuild but it seems like we have no other way than to split to top-level stacks for now, so we can mitigate those long repeated processes per lambda. Are we right?

Cosmic-Ryver commented 1 year ago

@fab-mindflow Unfortunately the bundling logic is not centralized and is instead a side effect of synthing stacks (really it's a side effect of instantiating NodejsFunction constructs), so this feature would not solve your problem. It would probably be worthwhile to open a separate feature request on that issue. I would definitely be interested in it myself.

To clarify for everyone else who comes across this: the feature is limited to cdk deploy only. This is out of necessity due to how the toolkit is currently implemented. Expanding this behavior to other cli commands or bundling constructs should be requested in separate issues.

Cosmic-Ryver commented 1 year ago

For public visibility: I've gotten feedback from internal AWS folks in out of band channels. As I understand, the CDK team is supportive of this request and have asked that I open a PR to move it forward. I will do that as soon as I am not insanely swamped with work 😅

NimmLor commented 1 year ago

Hi. I've also experimented with parallel bundling, however in a more hacky way.

Tested with a stack of 100x lambda functions that only include a console.log. A standard synth took about 170 seconds, and parallel bundling took about 15 seconds.

I'd definitely like to see this natively supported.

Cosmic-Ryver commented 1 year ago

I have breathing room now and made some progress on this over the weekend. My goal is to have a PR out before the end of the week.

borgoat commented 1 year ago

Just joining the conversation to share our experience, as we faced the same issue stating with around 50 functions, and maybe it's useful to others.

It may be a bit of a special setup since we use Nx with its esbuild plugin to invoke CDK, but anyway, I tested it also with plain esbuild, and it may be replicated even in a plain package.json script or with other tooling (maybe even SWC instead of esbuild, if that's your jam!).

We moved the handlers into some folders with a common suffix (something like ${name}.fn/index.ts - this is particularly useful if NOT using Nx).

Now, we can run esbuild before invoking CDK, and put all Lambdas into a build folder in the repo, with the same tree structure as the CDK code (inside ./dist/lambda).

In the gist is the example configuration for Nx ^1. Using esbuild from the shell, it's something like this:

yarn esbuild libs/**/*.fn/index.ts --bundle --platform=node --minify --sourcemap=external --outdir=dist/lambda

Finally, we have a construct ^1 (heavily inspired by the NodejsFunction construct ^2), that will easily reconstruct the full path of the pre-bundled lambda, based on the name of the folder.

esbuild takes ~2 seconds to bundle those 50 Lambda Functions, before it would take more than 1 minute.

Additionally, through Nx we get caching (not that it makes that much of a difference at this point, but it's still nice to have)

fab-mindflow commented 1 year ago

That looks super appealing! Q: Can we expect CDK team to have that soon in coming release (I see it set as p1) or should we move forward with @borgoat approach?

bestickley commented 1 year ago

@GusBuonv and others, have you considered if the AWS CDK NodejsFunction construct could cache esbuild built assets based on source code changes? Is this possible? I'd think this would make synth much faster and alleviate DX concerns noted here. I don't have a ton of experience with Asset construct (link) but I see it has a sourceHash. Maybe this could be used to cache built assets? I can create another issue but first wanted to ask.

NimmLor commented 1 year ago

@bestickley Today, I have migrated to turborepo in combination with yarn v3. Now, I bundle the code before synthesizing the CDK stack. Turborepo provides build caching, but it's not a major concern because esbuild is incredibly fast. I'm following a similar approach to @borgoat, but instead of Nx, I'm using turborepo.

The folder structure also adheres to the convention <function-name>.fn/index.ts.

Here's the bash script I use for bundling:

esbuild **/*.fn/index.ts --bundle --platform=node --outdir=dist --external':@aws-sdk/*'

To enable build caching, it's important to separate the lambda code into a separate package. Otherwise, the file hash will change with every CDK update.

I've documented my setup in a gist, which you can find here: https://gist.github.com/NimmLor/a9ae54ef9e55e96193508a83fb50c495

bestickley commented 1 year ago

@NimmLor, that's great. But wouldn't it be even nicer if aws-cdk-lib/aws-lambda-nodejs handled that for you? I understand some projects require custom setups, but I'd prefer to not have to pull in additional tooling.

NimmLor commented 1 year ago

@bestickley Definitely, I'd love to see parallel builds implemented in aws-cdk-lib/aws-lambda-nodejs. I've played around with the code, and it doesn't seem too difficult to implement.

I believe that implementing build caching in CDK could pose challenges due to the multitude of factors involved in generating a hash. You may want to explore turborepo's hashing strategy for further insights.

For me, the preferred approach is to organize the lambda functions in a separate package. This ensures that unnecessary dependencies are not inadvertently bundled. I have frequently encountered situations where I unintentionally included some CDK dependencies in my bundle.

moltar commented 1 year ago

We are also experiencing immense pain from slow builds. It is exacerbated by the fact that we deploy the same solution to many environments. And the construct is not smart enough to realize this, and rebuilds the asset for each stage again and again, even though the entry handler file is the same. In the CodePipeline/CodeBuild environment we are at 8 minutes of build time now, and it keeps getting worse with each new stage (app instance) added.

NimmLor commented 1 year ago

@moltar I've used the aws-cdk-lib.pipelines in a few projects in the past and the assets are only built once if it is setup correctly. One app consisted of 40+ lambdas, which was very painful to work with.

Pipeline Screenshots

![image](https://github.com/aws/aws-cdk/assets/32486857/edc9767e-2e4c-464a-8c66-50dd1f80465a) ![image](https://github.com/aws/aws-cdk/assets/32486857/dcd3c01a-a2fe-4652-b99f-fa6c7f569f11)

moltar commented 1 year ago

if it is setup correctly

What does that mean tho? I think our setup is correct.

moltar commented 1 year ago

Also, I don't like an asset-per-pipeline setup, as it has a ton of overhead. Plus our quantity of Lambas exceeds the limits of CodePipeline.

fab-mindflow commented 11 months ago

Any news on this? This looks quite critical for large CDK projects with numerous lambdas (like us). Lambda bundling phase is terribly slow...

piotrmoszkowicz commented 11 months ago

I would love to have that feature within CDK, it takes ages to build our Lambda dependant stacks!

abuziyaada commented 8 months ago

yes... ran into the same problem. We are in need of a built in solution for this.

dannysteenman commented 5 months ago

+1 (bump) to have this feature added

github-actions[bot] commented 2 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

ncino-esselman commented 2 months ago

This would be very helpful to my organization as well!

FelixRelli commented 1 month ago

+1

aws / aws-cdk