Open davidjb opened 1 year ago
Hey @davidjb Thanks for submitting the feature request and for explaining why the current workaround is not satisfying. I agree with you. Both approaches in your proposal sound reasonable to me. I'll raise that to team and see how we will prioritize it.
Hey @hawflau any update on this Feature request?
I made an edit to my original post above but thought it worth commenting to highlight the scope and severity of this issue further. Previously, I'd just been working with LayerVersion
s with native code but I've shifted to running Function
s with Node.js runtimes that require native and platform-specific code.
The above workaround that relies on caching is fairly simple with LayerVersion
s - in order to get SAM to cache a LayerVersion
, you need to create a mock entry in .aws-sam/build.toml
(as per my comment at https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801) and then run sam build --cached --use-container LogicalID
, which caches the build reseults and updates the source_hash
, allowing a subsequent sam build --cached
to recognise the cache correctly.
However, things get more complicated when a Function
is involved, especially one which is using a Node.js (nodejs20.x
) runtime. In Node's case, there are both source_hash
and manifest_hash
to be matched and a non-container build also requires .aws-sam/deps/{uuid}
to be present for SAM to consider a cache valid. Building in a container will update source_hash
but not manifest_hash
in build.toml
, and also won't create .aws-sam/deps/{uuid}
as it doesn't run CopyDependencies
. So, end-to-end, to successfully mix container and non-container builds:
Add fake entries for all resources to build.toml
as per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801
Run sam build --cached --use-container LogicalID
for each resource that needs a container. source_hash
in build.toml
will be updated and cache created.
sam build
commands correctly.For resources with a runtime that requires deps
, manually copy dependencies like so:
RESOURCE_ID=LogicalID
# Or manually get the UUID out of build.toml
BUILD_UUID=$(yq '.function_build_definitions[] | select(.functions[0] == "$RESOURCE_ID") | key' .aws-sam/build.toml)
mkdir -p .aws-sam/deps/$BUILD_UUID
# Change the following based on runtime; this is for Node.js
cp .aws-sam/build/$RESOURCE_ID/node_modules .aws-sam/deps/$BUILD_UUID/
Manually calculate manifest_hash
(MD5 hash) for each resource, such as for Node.js:
md5sum .aws-sam/build/$RESOURCE_ID/package.json
and now enter this into .aws-sam/build.toml
For this step, you can also cheat and have SAM update the manifest_hash
es for you all at once by running sam build --cached
, taking care that none of the container-built artefacts get affected. In my case, this command fails mid-build as certain Makefile
-based resources aren't designed for a host-based build. That said, the manifest_hash
values still updated, even on failure.
Run sam build --cached
to build all resources. Observe your container-based resources will be considered to be cached and won't be re-built now that source_hash
and manifest_hash
are correct and .aws-sam/deps/{uuid}
is present.
At the conclusion of the workaround, you're left with a set of container-built resources and host-built resources which works, but is quite complex. Could you, @hawflau, look at prioritising a solution or re-triaging this as a bug, please?
Describe your idea/feature/enhancement
Building resources -
LayerVersion
s andFunction
s - can currently occur either on the host natively or withsam build --use-container
to build within a Docker container, but this an an all-or-nothing option - either everything builds in a container, or nothing does.This causes an issue when a resource consists of or depends on native code, which necessitates that it must be built in a container (matching the runtime environment), and attempting to use it together with other resources that needs to be run on the host or otherwise do not work with container builds (such as
go1.x
onprovided.al2
/provided.al2023
). In real terms, this problem occurs with a GoFunction
attempting to depend on a native-codeLayerVersion
- the Function must be built on the host but the LayerVersion can only be built correctly in an container matching the Amazon Linux runtime. Similarly, the same issue occurs when intermixing multipleFunction
s in a configuration, such as Node.js, where some functions rely on native modules (requiring container-based builds) and others which do not.If SAM supported the ability to explicitly set how a given resource should be to be built, this would resolve the issue and allow a single call to
sam build
to build all resources correctly.Proposal
Propose adding another
Metadata
resource attribute to the SAM specification like so:or alternatively, implementing a CLI flag and adding support to
samconfig.toml
, similar tobuild_image
, where individual resources can be set to explicitly use or not use container builds.Additional Details
It is possible to work around this issue by manually building each Layer and Function individually with the correct settings (e.g. running
sam build --use-container ResourceName
for parts that need it and then without--use-container
for other resources), but this relies on the user mentally construct the dependency chains and effectively managing the caching performed by SAM. If one forgets to fully build all the layers required by a function with the correct settings or if SAM considers one cache element out of date, then the build will produce incompatible resources and fail to work.Additionally, there are challenges when mixing container-based and host-based builds since build processes may differ between build types. For example, with
nodejs20.x
runtimes, a container-based build will store cached artefacts into.aws-sam/cache/{uuid}
but a host-based build looks at.aws-sam/deps/{uuid}
for dependencies and assess whether a build's cache is valid. In order to mix build types for this runtime and get a successful SAM build I can deploy, I've had to manually copy.aws-sam/build/{LogicalFunctionID}/node_modules
over to.aws-sam/cache/{uuid}/node_modules
after runningsam build --use-container LogicalFunctionID
. After this,sam build
considers the cache valid and runs an incremental Node build, copying the deps folder into the build directory (equating to an expensive noop since thenode_modules
dir was already there from the container-based build).Edit: there appears to be a change in caching in either SAM 1.91.0 or 1.92.0 - my caching workaround is no longer functioning. In other words, even if you build a layer with
--use-container
, subsequently it isn't considered as being cached by SAM withsam build
without--use-container
.Edit2: the caching workaround requires creation of entries in
.aws-sam/build.toml
as per https://github.com/aws/aws-sam-cli/issues/5585#issuecomment-1659384801. Once a build definition is present, builds will update the definition inbuild.toml
correctly.