aws / aws-sam-cli

CLI tool to build, test, debug, and deploy Serverless applications using AWS SAM
https://aws.amazon.com/serverless/sam/
Apache License 2.0
6.51k stars 1.17k forks source link

Skip Docker Lambda image builds when ImageUri is a valid ECR location #2934

Open alexisfacques opened 3 years ago

alexisfacques commented 3 years ago

Describe your idea/feature/enhancement

When configuring a lambda / serverless function of which the PackageType is Zip, SAM preemptively checks whether the resource CodeUri is a S3 location or not.

  # This function is already packaged and will to be ignored by SAM
  Function:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Zip
      CodeUri: s3://bucket/function.zip
      ...

If so, SAM will not attempt to build said lambda, and will show a warning to the user:

% sam build
The resource AWS::Serverless::Function 'Function' has specified S3 location for CodeUri. It will not be built and SAM CLI does not support invoking it locally.

I would expect SAM to do the same when attempting to build a PackageType: Image lambda function; checking whether or not the resource property ImageUri is a valid ECR URL before attempting to build said images locally, yet this feature is not supported by SAM today:

  # SAM should not attempt to build this Lambda
  Function:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest

Instead, attempting to do so will most likely output the following exception (as the DockerContext resource metadata, required for building Docker Images using SAM, will most likely be unset, error traced here):

% sam build
Building image for Function function
Traceback (most recent call last):
  File "/usr/local/bin/sam", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 73, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/telemetry/metric.py", line 153, in wrapped
    raise exception  # pylint: disable=raising-bad-type
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/telemetry/metric.py", line 122, in wrapped
    return_value = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/utils/version_checker.py", line 42, in wrapped
    actual_result = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/samcli/cli/main.py", line 90, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/samcli/commands/build/command.py", line 210, in cli
    do_cli(
  File "/usr/local/lib/python3.8/site-packages/samcli/commands/build/command.py", line 315, in do_cli
    artifacts = builder.build()
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/app_builder.py", line 169, in build
    return build_strategy.build()
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/build_strategy.py", line 41, in build
    result.update(self._build_functions(self._build_graph))
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/build_strategy.py", line 52, in _build_functions
    function_build_results.update(self.build_single_function_definition(build_definition))
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/build_strategy.py", line 118, in build_single_function_definition
    result = self._build_function(
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/app_builder.py", line 500, in _build_function
    return self._build_lambda_image(function_name=function_name, metadata=metadata)  # type: ignore
  File "/usr/local/lib/python3.8/site-packages/samcli/lib/build/app_builder.py", line 322, in _build_lambda_image
    docker_context_dir = pathlib.Path(self._base_dir, docker_context).resolve()
  File "/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 1042, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 683, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/local/Cellar/python@3.8/3.8.10/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pathlib.py", line 667, in _parse_args
    a = os.fspath(a)

As a result, it is currently not possible to deploy Docker-packed Lambda functions, without building the underlying image using SAM, which is, in my opinion, very limiting (e.g., inability to share ECR registries & images among an Organization, inability to use Docker images that require a specific build workflow -passing secrets using buildkit, environment specific args-...)

I did find a workaround, tricking SAM into skipping the build of a PackageType: Image Lambda, by setting the Code.S3Bucket property to an empty value, but I believe this to be very hack-y:

  # A working hack to skip the build of a Image Lambda function
  Function:
    Type: AWS::Lambda::Function
    Properties:
      PackageType: Image
      Code:
        S3Bucket: !Ref AWS::NoValue
        ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest

Proposal

Things to consider:

  1. Will this require any updates to the SAM Spec

Additional Details

A PR should follow soon with a workaround proposal.

qingchm commented 3 years ago

Going to review the PR ;)

alexisfacques commented 3 years ago

Hello there @qingchm @sriram-mv, regarding this issue and PR#2935, came across some integrations problems; didn't want to open a new issue.

Description:

As discussed above, the proposed solution fixes problems occurring in build command, skipping the build of ECR images when said images are already valid ecr-registry images.

  # SAM will not attempt to build this Lambda
  Function:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest
% sam build -t template.yml
The resource AWS::Serverless::Function 'Function' has specified ECR registry image for ImageUri. It will not be built and SAM CLI does not support invoking it locally.

Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Deploy: sam deploy --guided

However, attempting to deploy such a template will likely output a validation error, ...and the user will still be prompt with an image repository argument using the --guided context, not none of the --image-repository or --image-repositories options are set:

Steps to reproduce:

Resources:

SAM will not attempt to build this Lambda

Function: Type: AWS::Serverless::Function Properties: PackageType: Image ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest


Without specifying any option, or `samconfig.toml`

- `sam build`
- `sam build --guided`

### Observed result:
<!-- Please provide command output with `--debug` flag set. -->

Respectively, 

% sam deploy
Usage: sam deploy [OPTIONS] Try 'sam deploy -h' for help.

Error: Missing option '--image-repository' or '--image-repositories'

% sam deploy --guided

Configuring SAM deploy

Looking for config file [samconfig.toml] :  Not found

Setting default arguments for 'sam deploy'
=========================================
Stack Name [sam-app]: 
AWS Region [eu-west-1]: 
Image Repository for Function:


### Expected result:
<!-- Describe what you expected. -->

Both the validation and the prompt should be ignored for this kind of functions.

Issue as been traced to [`image_repository_validation.py`](https://github.com/aws/aws-sam-cli/blob/develop/samcli/lib/cli_validation/image_repository_validation.py), and [`guided_context.py`](https://github.com/aws/aws-sam-cli/blob/cc806a28968bae5b0e63845a767307383082458b/samcli/commands/deploy/guided_context.py#L292)

### Proposal:

Will open a PR shortly regarding this integration issue. As [`SamFunctionProvider._extract_functions()`](https://github.com/aws/aws-sam-cli/blob/cc806a28968bae5b0e63845a767307383082458b/samcli/lib/providers/sam_function_provider.py#L107) now only returns a list packageable functions, simplest way to do so would be to resolve template function resources using this provider, rather than the [`commands/_utils/template`](https://github.com/aws/aws-sam-cli/blob/cc806a28968bae5b0e63845a767307383082458b/samcli/commands/_utils/template.py) utilities.
ccmoose commented 3 years ago

Hi,

I haven't contributed to SAM before, but is there any way I can help move this along? I'm happy to help with any work that needs to be done.

This has been an issue at my company. We have lambda function container images that host machine learning models that can be used in multiple SAM stacks. When I bumped into this issue a couple of months ago I came here with the intent of opening a feature request but found this one had just been opened. I've been silently watching its progress but my impatience got the better of me. (That and my current workaround is a private fork with a very similar solution... not ideal.)

Anyway, my apologies if these things simply take time but do let me know if I can contribute/test/whatever to help.

Thanks, Mike "Moose" Moustakas

m-chandler commented 3 years ago

Came here to give this issue a +1. I would also like to be able to re-use an ECR image built separately to the stack which is being deployed.

qingchm commented 3 years ago

Hey @alexisfacques thanks for the follow up PR! Planning to review it can you please meanwhile resolve the merge conflicts with the current develop branch if you can!

alexisfacques commented 3 years ago

Heya @qingchm, thanks for the reminder; there was some failing tests on that branch at the time of the PR, I'll take the action of fixing that during this upcoming week-end.

bogdannazarenko commented 3 years ago

Hi, The following might be related to the same issue. According to AWS SAM documentation docs, Metadata takes precedense over ImageUri and sam build rebuilds the image and sam deploy deploys the rebuilt image.

Can we add a flag to sam build --ignore-metadata or something similar to tell CLI to deploy the image from the specified ImageUri?

We follow a promotion process of Docker images across ECR repositories but we would also like to take advantage of local testing and debugging. Currently developers need to know to comment out Metadata when commiting change to github, OR we need to have a special hack (script) in CI/CD process to remove Metadata field from the template before running build and deploy commands.

ccmoose commented 3 years ago

@bogdannazarenko As a SAM user, it was intuitively obvious to me not to specify an ImageUri and Metadata. I would suggest highlighting this in the documentation rather than an "--ignore-metadata" field. (Personally I would find that more confusing.) Or maybe even something as simple as a basic check -- if both Metadata and ImageUri are specified, dump out a warning or error message.

(For context, we've been using a private fork of SAM at my company with a simpler variation of this fix in place to unblock ourselves. I cannot wait to get off that and back to official releases!)

Thanks

maskani-moh commented 2 years ago

Upvoting the issue! 👍

ccmoose commented 2 years ago

Found a reference to this issue in some old notes, and I noticed PR #2935 (https://github.com/aws/aws-sam-cli/pull/2935) that fixes this issue.

Unless I'm missing something, that means this issue can be resolved, right?

matlik commented 2 years ago

I'm not convinced this issue is fully resolved. In fact, the solution implemented in #2935 has introduced a regression for us.

This issue requests that the following isn't built:

  Function:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest

However, why would this preclude SAM from being able to invoke the lambda locally if the docker image already exists in the local image cache?

We are using SAM to integration test lambdas that are built from multiple projects. There is no build or deploy being performed, just a local test of the lambda functions and the step function state machine that wires it all together. But now we are getting the following error when things used to work fine: The resource AWS::Serverless::Function 'SetupFunction' has specified ECR registry image for ImageUri. It will not be built and SAM CLI does not support invoking it locally.

jp-reejig commented 2 years ago

/+ 1

taylorsmithgg commented 1 year ago

I'm not convinced this issue is fully resolved. In fact, the solution implemented in #2935 has introduced a regression for us.

This issue requests that the following isn't built:

  Function:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      ImageUri: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/function:latest

However, why would this preclude SAM from being able to invoke the lambda locally if the docker image already exists in the local image cache?

We are using SAM to integration test lambdas that are built from multiple projects. There is no build or deploy being performed, just a local test of the lambda functions and the step function state machine that wires it all together. But now we are getting the following error when things used to work fine: The resource AWS::Serverless::Function 'SetupFunction' has specified ECR registry image for ImageUri. It will not be built and SAM CLI does not support invoking it locally.

100x this. We have no means to test using sam, have to do it manually with docker.

taylorsmithgg commented 1 year ago

We did find a work around by "tricking" sam:

  FunctionName:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub "function-name-${Stage}"
      Timeout: 900
      MemorySize: 768
      PackageType: Image
      Role: arn:aws:iam::<aws-account-id>:role/lambda
#      Exclude ECR from the name because local invoke breaks
#      ImageUri: !Sub "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/org/image-name:${ImageTag}"
      ImageUri: !Sub "image-name"
      ImageConfig:
        Command:
          - com.MyMainClass::handleRequest
      Environment:
        Variables:
          VAR1: !Ref StateMachineV2
alexisfacques commented 1 year ago

Well it's been a while. So, let me give you a quick update on what's been happening with the issue.

To add some clarity, the changes I proposed in #2935 were to let us skip the build of Lambda functions with valid ECR registry locations as CodeUri / Code parameters. It's like using a valid S3 URI that skips the local artifacts build - and life saver if you’re building big Docker images, as I was at the time.

However, there was a bit of a hiccup with sam deploy --guided (Guided Context) and sam deploy commands, throwing this error: Error: Missing option '--image-repositories', '--image-repository', '--resolve-image-repos' even when all the IMAGE Lambda functions in the templates were valid image URIs (thus not requiring the --image-repository option to be set).

One workaround is set the --image-repository option to an empty string. I proposed a fix back then, but unfortunately, it turned out to be quite complex and couldn't keep up with all the new code and unit tests to make it through validation

Fast forward 2 years, the sam deploy —guided (Guided Context) issue seems to be resolved now; however, we still have a problem with the deploy options validation lingering around. I've resubmitted a much simpler PR (#5729) that should finally put an end to this issue. It makes use of a the _is_all_image_funcs_provided method, since introduced, to skip the option requirement if no "buildable" IMAGE lambdas are specified within a template.

I see no particular side effect to this change, but will happily discuss the matter further if needed. @moelasmar @mildaniel

taylorsmithgg commented 1 year ago

@alexisfacques

The resource AWS::Serverless::Function 'SetupFunction' has specified ECR registry image for ImageUri. It will not be built and SAM CLI does not support invoking it locally.

What about sam local invoke?

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

alexisfacques commented 1 year ago

@alexisfacques

The resource AWS::Serverless::Function 'SetupFunction' has specified ECR registry image for ImageUri. It will not be built and SAM CLI does not support invoking it locally.

What about sam local invoke?

Hi @taylorsmithgg, if I've grasped your earlier messages correctly, you're utilizing the desired ECR location within your Code settings, which does indeed prevent local execution.

To follow up on your issue, are you aiming for SAM to manage the build and deployment of this image? Or is your intention solely to run it locally?

If your expectation involves the build process, please note that the Code parameter doesn't hold significance within SAM when dealing with IMAGE Lambdas. It essentially serves as the local image name on your system, and it will be retagged with the ECR registry name (as specified in your SAM configuration file or command-line argument) along with the DockerImageTag metadata. Modifying the Code parameter to any other value should resolve the issue.

In my view, this behavior aligns with norms, similar to ZIP Lambdas. For instance, specifying a valid S3 location in the Code property skips the build phase entirely, thus preventing local invocations. On the other hand, targeting a local path triggers the build process, with the local path being substituted by SAM's designated S3 location upon deployment.

However, if your intention is for SAM to retrieve the image to facilitate local invocation, this introduces an entirely new feature (albeit super intriguing imo); but we should consider extending this functionality to ZIP Lambdas as well (assuming the AWS credentials provided possess adequate permissions to fetch S3 data).

taylorsmithgg commented 1 year ago

@alexisfacques We use jib to build our Java lambdas in lieu of sam for many reasons.

However, if your intention is for SAM to retrieve the image to facilitate local invocation, this introduces an entirely new feature (albeit super intriguing imo); but we should consider extending this functionality to ZIP Lambdas as well (assuming the AWS credentials provided possess adequate permissions to fetch S3 data).

This is accurate and what is intended. Essentially we migrated a few projects and now have to manually run several steps to stand up a local invocation that should be easily solvable by sam local invoke

It doesn't seem like there should be the same restrictions in place for docker builds as zips since you can arbitrarily retrieve images from any source available. (local, ecr, dockerhub, etc.)

In the above comment, it seems that this was working at some point, at least in a rough format before the fix.

alexisfacques commented 1 year ago

@alexisfacques We use jib to build our Java lambdas in lieu of sam for many reasons.

However, if your intention is for SAM to retrieve the image to facilitate local invocation, this introduces an entirely new feature (albeit super intriguing imo); but we should consider extending this functionality to ZIP Lambdas as well (assuming the AWS credentials provided possess adequate permissions to fetch S3 data).

This is accurate and what is intended. Essentially we migrated a few projects and now have to manually run several steps to stand up a local invocation that should be easily solvable by sam local invoke

It doesn't seem like there should be the same restrictions in place for docker builds as zips since you can arbitrarily retrieve images from any source available. (local, ecr, dockerhub, etc.)

In the above comment, it seems that this was working at some point, at least in a rough format before the fix.

Ahhh I get it now.

Indeed, as long as the image is appropriately tagged within your local Docker environment, I'd personally see no reason why SAM would not be able to run it locally.

FYI, this matter can be traced back to this specific line within the local invoke execution process. Ultimately, this code relies on this instruction found within the component responsible for listing (valid) Lambda functions in a template, shared between both the local invoke and deploy commands.

I would leave it to @mildaniel & @moelasmar to decide on that matter, but I believe we could easily get around that by altering how this method (called by deploy) would handle returning Lambda functions declarations: get_all() would return deployable functions, while self.functions would return all functions but s3 located zip ones. Shouldn't be any other side effects in regards to how the Provider object is used.