aws / serverless-application-model

The AWS Serverless Application Model (AWS SAM) transform is a AWS CloudFormation macro that transforms SAM templates into CloudFormation templates.
https://aws.amazon.com/serverless/sam
Apache License 2.0
9.32k stars 2.38k forks source link

The final policy size is bigger than the limit #337

Open christopheblin opened 6 years ago

christopheblin commented 6 years ago

I have created a SAM template that contains a lot of API events for one function

When I deploy it to CFN, I have the following error :

17:14:03 UTC+0100 | CREATE_FAILED | AWS::Lambda::Permission | MyApiPutAdminLoginsResourcePermissionProd | The final policy size (20851) is bigger than the limit (20480).

Is there a way to avoid that ?

The first strange thing to me is : why does it creates one permission per resource instead of one permission for the whole API ?

The second strange thing for me is that it creates 2 permissions for each resource like MyApiPutAdminLoginsResourcePermissionTest + MyApiPutAdminLoginsResourcePermissionProd. Why does it duplicate the permission for Prod and Test ? Where does Prod and Test comes from ?

Here is the template (some parts obfuscated or removed for security like environment vars)

AWSTemplateFormatVersion : 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: My API

Globals:
  Function:
    VpcConfig:
      SecurityGroupIds:
        - Fn::ImportValue: "ServerlessSecurityGroupId"
      SubnetIds:
        - "subnet-xxx"
        - "subnet-xxx"

Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: 'LambdaExecutionPolicy'
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - ec2:CreateNetworkInterface
            - ec2:DescribeNetworkInterfaces
            - ec2:DetachNetworkInterface
            - ec2:DeleteNetworkInterface
            Resource: "*"
      RoleName: LambdaExecutionRole

  MyApi:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: "MyApi"
      Handler: com.acme.api.ApiHandler
      Runtime: java8
      CodeUri: ./build/distributions/my-api.zip
      Timeout: 60
      MemorySize: 1024
      Role: !GetAtt LambdaExecutionRole.Arn
      Events:
        Cors:
          Type: Api
          Properties:
            Path: /{proxy+}
            Method: options

        # Auth
        PostRegistrationsResource:
          Type: Api
          Properties:
            Path: /auth/registrations
            Method: post
        PostConfirmationsResource:
          Type: Api
          Properties:
            Path: /auth/confirmations
            Method: post
        PostRenewalsResource:
          Type: Api
          Properties:
            Path: /auth/renewals
            Method: post
        PutUserResource:
            Type: Api
            Properties:
              Path: /auth/users/{id}
              Method: put
        GetUserResource:
            Type: Api
            Properties:
              Path: /auth/users/{id}
              Method: get
        GetUsersResource:
            Type: Api
            Properties:
              Path: /auth/users
              Method: get
        GetAuthPingResource:
            Type: Api
            Properties:
              Path: /auth/ping
              Method: get

        # Contacts
        PostInvitationsResource:
          Type: Api
          Properties:
            Path: /contacts/invitations
            Method: post
        PutInvitationsResource:
          Type: Api
          Properties:
            Path: /contacts/invitations/{id}
            Method: put
        GetInvitationsMineResource:
          Type: Api
          Properties:
            Path: /contacts/invitations/mine
            Method: get
        GetContactsMineResource:
          Type: Api
          Properties:
            Path: /contacts/mine
            Method: get
        GetPossibleContactsMineResource:
            Type: Api
            Properties:
              Path: /contacts/possibleContacts/mine
              Method: get
        PostPossibleInvitationsMineResource:
            Type: Api
            Properties:
              Path: /contacts/possibleInvitations/mine
              Method: post
        GetContactsPingResource:
            Type: Api
            Properties:
              Path: /contacts/ping
              Method: get

        # Admin
        PostAdminLoginsResource:
          Type: Api
          Properties:
            Path: /admin/logins
            Method: post
        PutAdminLoginsResource:
          Type: Api
          Properties:
            Path: /admin/logins
            Method: put
        GetAdminsResource:
          Type: Api
          Properties:
            Path: /admin/admins
            Method: get
        PostAdminsResource:
          Type: Api
          Properties:
            Path: /admin/admins
            Method: post
        GetAdminResource:
          Type: Api
          Properties:
            Path: /admin/admins/{id}
            Method: get
        PutAdminResource:
          Type: Api
          Properties:
            Path: /admin/admins/{id}
            Method: put
        GetAdminPingResource:
          Type: Api
          Properties:
            Path: /admin/ping
            Method: get

        # Providers
        PostProvidersResource:
          Type: Api
          Properties:
            Path: /providers/providers
            Method: post
        PutProvidersResource:
          Type: Api
          Properties:
            Path: /providers/providers/{id}
            Method: put
        GetProvidersResource:
          Type: Api
          Properties:
            Path: /providers/providers
            Method: get
        GetProviderResource:
          Type: Api
          Properties:
            Path: /providers/providers/{id}
            Method: get

        # Offers
        PostOffersResource:
          Type: Api
          Properties:
            Path: /offers/offers
            Method: post
        PutOffersResource:
          Type: Api
          Properties:
            Path: /offers/offers/{id}
            Method: put
        GetOffersResource:
          Type: Api
          Properties:
            Path: /offers/offers
            Method: get
        GetOfferResource:
          Type: Api
          Properties:
            Path: /offers/offers/{id}
            Method: get
christopheblin commented 6 years ago

The workaround I have found is to do :

Events:
    Proxy:
      Type: Api
      Properties:
        Path: /{proxy+}
        Method: any

so that only one resource is created and so only 2 permissions are created

cast commented 6 years ago

An api and global level configuration that creates a single permission from api gateway to the lambda function(s) would ease this issue.

keenskelly commented 6 years ago

Suddenly and unexpectedly I am experiencing this same error after the 50+th time redeploying the same API with the same two resources (a root resource, and a /{proxy+} resource, both of which point at a Lambda integration). It has been fine countless times, but, as I have grown the size/complexity of the lambda handler for the resource(s), it now seems to be an issue. I even deleted the entire API and set up the entire chain from scratch and the problem persists (thinking that it was some sort of cumulative error resulting from so many consecutive redeployments?). No luck. Any insights out there?

For what it's worth, reading another thread for input, I deleted both my AWS Lambda function (which was being repeatedly updated) AND the Apigateway being integrated with it, and the error went away. There seems to be some sort of cumulative error on the Lambda side of things because just deleting the Apigateway API previously did not do the trick...

keetonian commented 6 years ago

@christopheblin Issue https://github.com/awslabs/serverless-application-model/issues/285#issuecomment-368954057 addresses your question about why 2 permissions are created for those resources. There is a discussion in that issue as well about possible better approaches.

Did you find a workaround for your problem?

robermar2 commented 5 years ago

We just ran into this issue as well. API Gateway in SAM Template with about a dozen api events. Not that large at all.

If we create a new stack from scratch we do not get this error. It started happening with an existing stack we are updating.

Similar to @keenskelly

Jun711 commented 5 years ago

similar to https://github.com/aws/chalice/issues/48?

@christopheblin you can use the following boto3 lambda API to see what resource-based policies attached with your lambda functions. You can check out this article for information.

def clean_policy(fn_name):
    client = boto3.client('lambda') 
    policy = client.get_policy(FunctionName=fn_name)['Policy']
    statements = json.loads(policy)['Statement'] 
    sid_list = [item['Sid'] for item in statements][:-1]
    # for sid in sid_list:       
       # print("Removing policy SID {}".format(sid))
       #  client.remove_permission(FunctionName=fn_name, StatementId=sid)
    print(client.get_policy(FunctionName=fn_name))

by printing out the statements, I can see that, somehow, same policies were added repeatedly for my lambda functions.

In the long run, you can grant your API permission to invoke Lambda using a resource based policy or an IAM role to prevent getting this error in the future.

keetonian commented 5 years ago

@keenskelly @robermar2 Could you post a sample of a template that would result in the policy size exceeding the limit after multiple deployments? I wonder what the issue is for those, it seems like the policies are persisting for some reason.

I added the "breaking change" label because it might be possible to remove this extra permission, but that would be a breaking change. Removing this extra permission wouldn't necessarily fix the problem, but would help alleviate it somewhat. See https://github.com/awslabs/serverless-application-model/issues/285 for more discussion on this topic.

keepersmith commented 5 years ago

I just ran into this issue - it was not a result of multiple deployments creating duplicates. Perhaps my experience and solution will help someone else.

The issue (for me) was: 1) Each API endpoint creates about 900 bytes worth of policy (2 statements each, as stated before). So your maximum number of endpoints (per lambda) will be around 22, depending on your endpoint paths. 2) SAM/Cloudformation "append" new statements to the policy before deleting old ones. So, if you already have 22 endpoints, and then you replace or change (or add) one, cloudformation deployment will fail with the 20k policy size limit.

My (hack) solution was to: 1) change my lambda function name (in template.yml) to a temporary placeholder 2) deploy 3) change the function name back to the original name 4) deploy again 5) success

I know this is not a good long-term solution, so I will be changing to the resource based policy solution.

keetonian commented 5 years ago

There are a few solutions to this problem. Like @christopheblin wrote, if you want one lambda to handle all of your API endpoints, you can use a proxy endpoint and handle the routing inside of the lambda:

  Events:
    Proxy:
      Type: Api
      Properties:
        Path: /{proxy+}
        Method: any

You could also create multiple identical lambdas and split the API methods between them to work around the maximum policy size limit.

Due to the number of solutions available, this is not an issue that we are going to address in SAM.

keepersmith commented 5 years ago

Those proposed solutions all avoid the issue that, as is, AWS SAM has a built-in, undocumented, and unnecessary limit on the number of endpoints per lambda that it can support.

The folks at Serverless had a similar issue (https://github.com/serverless/serverless/issues/5357), and they fixed it by implementing AWS::Lambda::Permission with wildcards.

SAM could do any of the following to address this: 1) Look for declared AWS::Lambda::Permission blocks and not tack on its own, allowing users to work around this problem themselves w/o rewriting apps or duplicating lambdas. 2) Provide a flag that tells it not to make its own (potentially breaking) AWS::Lambda::Permission blocks. 3) Smartly look at the size of its own generated policies, and convert to wildcard policies when needed. 4) Just use a single wildcard AWS::Lambda::Permission block as the default.

johnc44 commented 5 years ago

Just experienced the same issue. Seems to happen when redeploying the same lambda, so we have to tear it down and recreate it.

I will try the solutions given, but extremely disappointing if this isn't going to be fixed. SAM is unbelievably frustrating to use - there seem to be issues at every turn that never get addressed.

keetonian commented 5 years ago

How many endpoints does it take to hit this limit?

1102 will help address this issue by reducing the number of permissions that SAM creates by half, which should greatly decrease the policy size.

@keepersmith Thank you for the list of suggestions. Reopening this issue to gather more information and to find a better way to address this issue.

bensie commented 5 years ago

@keetonian As others have mentioned, it seems to accumulate over time when repeatedly updating the same stack (and updating the existing Lambdas). The more endpoints you have, the fewer deploys until hitting this error. If I could venture a guess I'd say that with our ~65 endpoints it takes about 25 deploys before hitting this limit, at which point we create a new stack and discard the one that hit the ceiling. Commenting out all the Lambda functions, deploying, then uncommenting and deploying will also clear out whatever is getting accumulated.

johnc44 commented 5 years ago

The workaround I have found is to do :

Events:
    Proxy:
      Type: Api
      Properties:
        Path: /{proxy+}
        Method: any

so that only one resource is created and so only 2 permissions are created

I tried this, but as far as I can tell, when requests come through to the lambda, I have to read a pathParaemter 'proxy' to find out the real path. (And then my actual path parameters won't come through). This is when running through sam local anyway.

Unless I'm missing something, this doesn't feel like a good solution?

tianmarin commented 4 years ago

The same happened to us. Just updating our template adding a few endpoints.

Events:
    Proxy:
      Type: Api
      Properties:
        Path: /{proxy+}
        Method: GET
        Auth:
            Authorizer: AmplifyCognitoAuthorizer

This did not work for us as this error happens when I create "authorized" resources (but want to expose the OPTIONS method for CORS configuration) :

Unable to set Authorizer [XXX] on API method [put] for path [/{proxy+}] because the related API does not define any Authorizers.

My own workaround is to duplicate the function (which feels anything but a good idea) and add the new Api Events on the new Function, like this:

  FunctionA:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      CodeUri: ../lambda/function/
      FunctionName: FunctionA
      Events:
        personlist:
            Type: Api
            Properties:
              RestApiId: !Ref API
              Path: /person
              Method: GET
              Auth:
                Authorizer: Authorizer
  FunctionB:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      CodeUri: ../lambda/function/
      FunctionName: FunctionB
      Events:
        personCreate:
            Type: Api
            Properties:
              RestApiId: !Ref API
              Path: /person
              Method: POST
              Auth:
                Authorizer: Authorizer

Maybe #1102 will help to it, but will still be limited.

Couldn't we bypass the AWS::Lambda::Permission being manually created and allowing every stage/resource/method?

johnc44 commented 4 years ago

The way I see this problem is that it is an accumulation over time. So if I deploy my lambda today, then keep making changes, eventually it will hit a limit.

So whilst halving the number of permissions SAM creates will mean that takes longer to happen, presumably at some point it will still happen and we'll be back to where we were? In which case, the solution seems to be to tear down the API altogether and redeploy. But, of course, this also means tearing down the front-end and anything else using !ImportValue to reference the API.

So what happens is that my front-end is down for an hour or so whilst I redeploy everything. (As a CloudFront teardown/deploy takes ages). This is not acceptable for a production system, and defeats the object of us using all these automated tools in the first place.

Why do we need a permission for every single method? Can we not just have one for the lambda?

I'm beginning to get to the end of my patience with SAM. It is so full of hidden gotchas, it's become a constant source of frustration.

This is a big problem for us.

Jun711 commented 4 years ago

@tianmarin @johnc44 you can grant your API permission to invoke Lambda using a resource based policy or an IAM role to prevent getting this error in the future.

tianmarin commented 4 years ago

@Jun711 That's exactly what I'm talking about! Thank you very much!!!

I just updated my CF template with this:

  ApiGatewayInvokeLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:InvokeFunction
      FunctionName: !GetAtt LAMBDA.Arn
      Principal: apigateway.amazonaws.com
      SourceArn: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${API}/*/*/*

IMPORTANT: Had to remove existing Lambda permissions in order to CloudFormation respond correctly. I did it manually as all Lambda permissions are replaced with this open new Lambda permission.

weird thing... it create a lot of lamba permissions anyway, but it's not failing so far...

johnc44 commented 4 years ago

@tianmarin - did this stay working for you? I've not done anything with it yet, but we've not seen the problem for a little while. (Probably because we've not done many deployments of this particular api for a while and it only happens sometimes).

henock1 commented 4 years ago

@johnc44 I'm experiencing the error just now, and I have my API configured similarly to @tianmarin

It took many deploys to get here. Granted, I'm not using SAM (just CloudFormation) but either way not a long term solution.

DjebbZ commented 3 years ago

Running into the same problem. Is there a definitive answer?

@tianmarin How did you

remove existing Lambda permissions in order to CloudFormation respond correctly. I did it manually as all Lambda permissions are replaced with this open new Lambda permission.

I see no way in AWS Console > CloudFormation > Stacks to click or interact with any of the permissions. I may be missing something here...

johnc44 commented 2 years ago

This problem hasn't occurred for years, but recently we added some new APIs and it cropped up again. We do have a single lambda that services quite a lot of API methods.

Although ther eis a comment above saying that this won't be resolved, I would like to question that. I don't understand what benefit there is of creating the permissions in the current way.

Can it be changed to either have a single permissino that covers all API methods, or have logic so that if there are a certain number of methods being added to an API that it splits it up into a second policy?

Otherwise, there's just some hidden limit that will randomly surface if an API gets too big.

We have only come up with 2 ways to solve the problem ourselves:

  1. Split both the API and function in two. It does not work if we only do one of these. Having two APIs doesn't really do what we want.

  2. Use a default route. HTTP APIs will let us do this in a similar way to the proxy route idea mentioned above. However, this has two drawbacks - firstly, we lose the documentation that the API Gataeway provides, and secondly we lose some of the API features, in particular path parameters which we use a lot. Yes, we can change our route handling and do that ourselves using lots of regex or something.

In both cases, we are having to make compromises due to a very long-standing issue in SAM.

I must admit, I don't quite get the resource based policy suggestion above. Our lambda has a role and custom policies attached, it also has a Permission that allows API gateway access to it. But this has no bearing on what permissions SAM is adding. Am I missing something?

chmarti commented 5 months ago

Hard to believe this is still an issue and isn't being fixed.

@DjebbZ we ended up writing a powershell script that runs after the deployment and uses aws lambda remove-permission to get rid of all of the ones SAM created and aws lambda add-permission to add one wildcard permission to replace them. On subsequent deployments SAM will not recreate the permissions.

This solution works ok but during the initial deployment it will fail with "The final policy size is bigger than the limit" if you have too many endpoints in your lambda. When that happens you run the script and retry the deployment and it succeeds.

If I could figure out how to get SAM to create the single wildcard permission during the initial deployment rather than trying to create one for every endpoint this would be solved. I've seen people mention you can do this with IAM roles but I've never seen a full working example SAM config to do it.