aws / aws-sam-cli

CLI tool to build, test, debug, and deploy Serverless applications using AWS SAM
https://aws.amazon.com/serverless/sam/
Apache License 2.0
6.53k stars 1.17k forks source link

Feature request: option for container-based builds for specific resources #5507

Open davidjb opened 1 year ago

davidjb commented 1 year ago

Describe your idea/feature/enhancement

Building resources - LayerVersions and Functions - can currently occur either on the host natively or with sam build --use-container to build within a Docker container, but this an an all-or-nothing option - either everything builds in a container, or nothing does.

This causes an issue when a resource consists of or depends on native code, which necessitates that it must be built in a container (matching the runtime environment), and attempting to use it together with other resources that needs to be run on the host or otherwise do not work with container builds (such as go1.x on provided.al2/provided.al2023). In real terms, this problem occurs with a Go Function attempting to depend on a native-code LayerVersion - the Function must be built on the host but the LayerVersion can only be built correctly in an container matching the Amazon Linux runtime. Similarly, the same issue occurs when intermixing multiple Functions in a configuration, such as Node.js, where some functions rely on native modules (requiring container-based builds) and others which do not.

If SAM supported the ability to explicitly set how a given resource should be to be built, this would resolve the issue and allow a single call to sam build to build all resources correctly.

Proposal

Propose adding another Metadata resource attribute to the SAM specification like so:

   Metadata:
      BuildMethod: makefile
      BuildArchitecture: arm64
      BuildType: [host|container]

or alternatively, implementing a CLI flag and adding support to samconfig.toml, similar to build_image, where individual resources can be set to explicitly use or not use container builds.

Additional Details

It is possible to work around this issue by manually building each Layer and Function individually with the correct settings (e.g. running sam build --use-container ResourceName for parts that need it and then without --use-container for other resources), but this relies on the user mentally construct the dependency chains and effectively managing the caching performed by SAM. If one forgets to fully build all the layers required by a function with the correct settings or if SAM considers one cache element out of date, then the build will produce incompatible resources and fail to work.

Additionally, there are challenges when mixing container-based and host-based builds since build processes may differ between build types. For example, with nodejs20.x runtimes, a container-based build will store cached artefacts into .aws-sam/cache/{uuid} but a host-based build looks at .aws-sam/deps/{uuid} for dependencies and assess whether a build's cache is valid. In order to mix build types for this runtime and get a successful SAM build I can deploy, I've had to manually copy .aws-sam/build/{LogicalFunctionID}/node_modules over to .aws-sam/cache/{uuid}/node_modules after running sam build --use-container LogicalFunctionID. After this, sam build considers the cache valid and runs an incremental Node build, copying the deps folder into the build directory (equating to an expensive noop since the node_modules dir was already there from the container-based build).

Edit: there appears to be a change in caching in either SAM 1.91.0 or 1.92.0 - my caching workaround is no longer functioning. In other words, even if you build a layer with --use-container, subsequently it isn't considered as being cached by SAM with sam build without --use-container.

Edit2: the caching workaround requires creation of entries in .aws-sam/build.toml as per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801. Once a build definition is present, builds will update the definition in build.toml correctly.

hawflau commented 1 year ago

Hey @davidjb Thanks for submitting the feature request and for explaining why the current workaround is not satisfying. I agree with you. Both approaches in your proposal sound reasonable to me. I'll raise that to team and see how we will prioritize it.

vikhyat187 commented 1 year ago

Hey @hawflau any update on this Feature request?

davidjb commented 1 year ago

I made an edit to my original post above but thought it worth commenting to highlight the scope and severity of this issue further. Previously, I'd just been working with LayerVersions with native code but I've shifted to running Functions with Node.js runtimes that require native and platform-specific code.

The above workaround that relies on caching is fairly simple with LayerVersions - in order to get SAM to cache a LayerVersion, you need to create a mock entry in .aws-sam/build.toml (as per my comment at https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801) and then run sam build --cached --use-container LogicalID, which caches the build reseults and updates the source_hash, allowing a subsequent sam build --cached to recognise the cache correctly.

However, things get more complicated when a Function is involved, especially one which is using a Node.js (nodejs20.x) runtime. In Node's case, there are both source_hash and manifest_hash to be matched and a non-container build also requires .aws-sam/deps/{uuid} to be present for SAM to consider a cache valid. Building in a container will update source_hash but not manifest_hash in build.toml, and also won't create .aws-sam/deps/{uuid} as it doesn't run CopyDependencies. So, end-to-end, to successfully mix container and non-container builds:

  1. Add fake entries for all resources to build.toml as per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801

  2. Run sam build --cached --use-container LogicalID for each resource that needs a container. source_hash in build.toml will be updated and cache created.

    • If any dependencies cannot be built in a container, then build them separately and order your sam build commands correctly.
  3. For resources with a runtime that requires deps, manually copy dependencies like so:

    RESOURCE_ID=LogicalID
    # Or manually get the UUID out of build.toml
    BUILD_UUID=$(yq '.function_build_definitions[] | select(.functions[0] == "$RESOURCE_ID") | key' .aws-sam/build.toml)
    mkdir -p .aws-sam/deps/$BUILD_UUID
    # Change the following based on runtime; this is for Node.js
    cp .aws-sam/build/$RESOURCE_ID/node_modules .aws-sam/deps/$BUILD_UUID/
  4. Manually calculate manifest_hash (MD5 hash) for each resource, such as for Node.js:

    md5sum .aws-sam/build/$RESOURCE_ID/package.json

    and now enter this into .aws-sam/build.toml

    For this step, you can also cheat and have SAM update the manifest_hashes for you all at once by running sam build --cached, taking care that none of the container-built artefacts get affected. In my case, this command fails mid-build as certain Makefile-based resources aren't designed for a host-based build. That said, the manifest_hash values still updated, even on failure.

  5. Run sam build --cached to build all resources. Observe your container-based resources will be considered to be cached and won't be re-built now that source_hash and manifest_hash are correct and .aws-sam/deps/{uuid} is present.

At the conclusion of the workaround, you're left with a set of container-built resources and host-built resources which works, but is quite complex. Could you, @hawflau, look at prioritising a solution or re-triaging this as a bug, please?