Clarity Request: Unexpected "Not Authorized" with IAM and Transformer v2

danrivett commented 2 years ago

Which Category is your question related to?

Amplify CLI Version

7.6.22

Summary of Problem

Since moving to the v2 Transformer we're now seeing our Lambdas which use IAM to access the AppSync API fail with:

Not Authorized to access XXX on type Mutation

It appears unrelated to the documented deny-by-default change.

Description of Problem

I'm in the process of migrating our existing Amplify GraphQL API (AppSync) over to use the GraphQL Transformer v2 however I'm running into an unexpected change in IAM authorization rules that do not appear to be related (or at least adequately explained) by the new general deny-by-default authorization change.

Our GraphQL API uses Cognito User Pools as the default authentication mechanism, and is used on the frontend by customers who log into their account.

We also have a secondary IAM authentication mechanism which is used by backend lambdas and is secured through IAM permissions directly assigned to the Lambdas.

We have several GraphQL models such as the following:

type User
  @model
  @auth(
    rules: [
      { allow: owner, ownerField: "cognitoUserId" }
      { allow: private, provider: iam } # Allow CRUD access from the backend
    ]
  ) {
  id: ID!
  cognitoUserId: ID!
  cognitoIdentityId: ID!
  firstName: String!
  lastName: String!
  phone: AWSPhone!
  email: AWSEmail!
  ...
}

On v1 of the GraphQL Transformer, this works great. However on v2, we're seeing:

{
  "path": [
    "updateUser"
  ],
  "data": null,
  "errorType": "Unauthorized",
  "errorInfo": null,
  "locations": [{
    "line": 2,
    "column": 3,
    "sourceName": null
  }],
  "message": "Not Authorized to access updateUser on type Mutation"
}

I don't believe this is explained by the new deny-by-default change, and I verified this by also explicitly listing the operations:

{ allow: private, provider: iam, operations: [create, read, update, delete] } # Allow CRUD access from the backend

What I am seeing is the generated Mutation.updateUser.auth.1.res.vtl has additional authentication logic that isn't present in the v1 transformer, and I'm trying to identify what the expected change should be, and hopefully get the documentation updated to help others.

Comparison of v1 Transformer to v2

The key change I've observed is that in v1's Mutation.updateUser.req.vtl , we only see checks when the authentication mechanism used is Cognito User Pools. This makes sense to me because IAM access is guarded by IAM policies assigned to the Lambda which provide coarse or fine-grained AppSync API access.

In v1's Mutation.updateUser.req.vtl, we only see:

#if( $authMode == "userPools" )

We do not see any logic with:

#if( $authMode == "iam" )

However in v2's Mutation.updateUser.auth.1.res.vtl, I'm now seeing a separate block for when IAM is being used:

#if( $util.authType() == "IAM Authorization" )
  #set( $adminRoles = ["AdminQueriesxxxxx-dan"] )
  #foreach( $adminRole in $adminRoles )
    #if( $ctx.identity.userArn.contains($adminRole) && $ctx.identity.userArn != $ctx.stash.authRole && $ctx.identity.userArn != $ctx.stash.unauthRole )
      #return($util.toJson({}))
    #end
  #end
  #if( ($ctx.identity.userArn == $ctx.stash.authRole) || ($ctx.identity.cognitoIdentityPoolId == "us-west-2:xxxxx" && $ctx.identity.cognitoIdentityAuthType == "authenticated") )
    #set( $isAuthorized = true )
  #end
#end

It's this block in particular that is interesting to me:

  #if( ($ctx.identity.userArn == $ctx.stash.authRole) || ($ctx.identity.cognitoIdentityPoolId == "us-west-2:xxxxx" && $ctx.identity.cognitoIdentityAuthType == "authenticated") )
    #set( $isAuthorized = true )
  #end

This is doesn't evaluate to true and so isAuthorized isn't set to true and so the error above is returned.

Looking at the context.identity object being created the for the IAM access from the lambda I see something like:

{
  "accountId": "XXX",
  "cognitoIdentityAuthProvider": null,
  "cognitoIdentityAuthType": null,
  "cognitoIdentityId": null,
  "cognitoIdentityPoolId": null,
  "sourceIp": ["XXX"],
  "userArn": "arn:aws:sts::XXX:assumed-role/appsync-user-created-handler-dan-us-west-2-lambdaRole/appsync-user-created-handler",
  "username": "XXX:appsync-user-created-handler"
}

Notice that userArn value which is the role assumed by the Lambda that was generated by our IaC framework - the Serverless Framework in our case - which defined the IAM permission to invoke this AppSync GraphQL endpoint. It doesn't match $ctx.stash.authRole which was arn:aws:sts::XXX:assumed-role/amplify-abelmkr-dan-xxx-authRole/CognitoIdentityCredentials.

Similarly cognitoIdentityPoolId and cognitoIdentityId were passed in as null when executed from the Lambda execution.

Understanding the Change and What Migration Steps are Required

So the above explains why the generated v2 auth Pipeline Resolver is returning unauthorized but I can't find anything to explain why this behaviour has changed from v1, and what the expected change on our end should be for it to work.

It seems like the Resolver is requiring all the Lambdas using IAM to assume that authRole, but I'm not sure the best way to do that. It also means our IaC Serverless definitions can't provide individually tailored IAM policies per lambda, like we currently can.

Desired Outcome

Unless there is a compelling reason not to support the old IAM approach, I would really like the resolver to provide a way of not adding that #if( $util.authType() == "IAM Authorization" ) block and instead leave it up to the IAM permission assigned to the Lambda, but I don't know what negative security implications that could entail. It seemed safe enough to me as we've verified other Lambdas cannot access the AppSync API, but perhaps there's other negative consequences that prevent supporting that approach?

Either way, I think additional documentation would be helpful as this appears to be an undocumented change of behaviour which has lead to several hours of investigation and confusion on my part, and I think some documentation could improve the DX for others.

danrivett commented 2 years ago

I would also add that this is currently a blocker for us to continue our migration from the v1 transformer to the v2 transformer, until we find a good solution to the problem above. We would like to complete the migration if we can though.

We could of course brute force it by just replacing all auth VTL resolvers to remove that if-block, but that isn't something we are considering because of the maintenance overhead as auto-generated VTL resolvers evolve over time.

rafaelfaria commented 2 years ago

I am also experiencing the same thing. Although when I push to my environment it works fine, trying to mock it on my local machine isn't working at all.

Someone suggested on another thread to use custom-roles.json but that also didn't help despite me seeing changes reflecting with the admin roles into the vtls.

sundersc commented 2 years ago

Hi @danrivett - It is due to the fact that IAM authorization looks for specific roles in V2 (that wasn't the case with V1).

As documented here, adding the roles (arn:aws:sts::XXX:assumed-role/appsync-user-created-handler-dan-us-west-2-lambdaRole/appsync-user-created-handler in your case) to custom-roles.json file (then amplify push) should give the necessary access. Let me know in case of any issues. https://docs.amplify.aws/cli/graphql/authorization-rules/#use-iam-authorization-within-the-appsync-console

danrivett commented 2 years ago

Thanks for reading the issue and replying @sundersc. Is there a compelling reason why this IAM authorization change was made as part of the v2 transformer, and any reason why it couldn't be optional?

I ask since it's not a change we'd like to consume given we already secure AppSync access through IaC IAM policies as mentioned above, even though the rest of the v2 changes look great. I'd hate for us to be blocked from migrating by this.

Regarding the option to add roles to custom-roles.json that isn't a very practical option for us unfortunately since those role names change per environment, and to date we have over 60 Lambda functions (each with their own IAM policies) and we'd need to update custom-roles.json each time we create a new Lambda that accesses AppSync.

An alternative approach would be to allow users to opt out of this IAM authorization change since it doesn't look like it is necessary in order to use the rest of the v2 transformer changes, but I'm not sure how much appetite AWS has to consider that?

sundersc commented 2 years ago

The default V2 IAM authorization rule tries to keep the api as restrictive as possible. However I understand that it is not an ideal solution for your setup. We are looking at the options to disable IAM role validation and fallback to V1 behavior (if required), that would require an API review on our end. Marking this as feature request.

danrivett commented 2 years ago

Thanks @sundersc I appreciate that. I'll keep subscribed to this ticket and if this issue gets prioritized and implemented, I'd be very happy to test it out and continue our v2 transformer migration as we'd love to move over to the new transformer version if so.

shriram192 commented 2 years ago

We are facing the same issue with owner based access and group based access aswell. This is specific to update mutations. Seems like an issue with pipeline resolvers for the update action. Was any update made to this recently?

type Farmer @model
@auth( rules: [ { allow: owner, operations: [create, update, read] }, { allow: groups, groups: ["Admin"], operations: [read] } ]) { identityId: String profileImg: String name: String! mobile: AWSPhone! @primaryKey billing: Shipping shipping: [Shipping] email: String fb: String google:String cart: [CartItem] wishList: [String] }

We are getting Unauthorized in the mutation - "Not Authorized to access updateFarmer on type Mutation" We need the resolution urgently for this as our system is already in production environment.

Please support as soon as possible

Ilya93 commented 2 years ago

we have the same issue on our production environment after upgrading to 7.6.22

type BroadcastLiveData @auth( rules: [ { allow: groups, groupsField: "editors", operations: [update] } { allow: public, provider: iam, operations: [read] } { allow: private, operations: [read] } ] ) @model(subscriptions: { level: public }) { id: ID! communicationState: AWSJSON editors: [String] }

We are getting "Not Authorized to access updateBroadcastLiveData on type Mutation"

edit: it was fixed as soon as I changed: { allow: groups, groupsField: "editors", operations: [update] } to this: { allow: groups, groupsField: "editors" }

shriram192 commented 2 years ago

we cannot remove the update

This is the intended functionality. ( GraphQL transformer is not working as intended. ) https://docs.amplify.aws/cli/migration/transformer-migration/#authorization-rule-changes

Prior to this migration, when customers used owner-based authorization @auth(rules: [{allow: owner, operations: [read, update, delete]}]), the “operations” fields were used to deny others access to the listed operations. In this example: “others can’t read, update, or delete.” With the new GraphQL Transformer, given the new deny-by-default paradigm, the owner-based authorization’s operation now specifies what owners are allowed to do. The same example above now means: “Owners can read, update, and delete”

danrivett commented 2 years ago

Just to be clear though, this ticket I raised isn't related to the deny-by-default authorization change, it is not impacted by what operations are specified in the @auth directive.

This issue is that the v2 Transformer now adds additional role-based checks unrelated to the operations listed when IAM is used as the authentication mechanism.

I just want to be clear about what this ticket was created to address. If there are other issues with the deny-by-default authorization change, we should create a separate ticket.

shriram192 commented 2 years ago

I think the issue we are facing is specifically for the update operation with all auth types, to be more specific this problem started a few hours ago. We can raise a separate ticket for this aswell.

sundersc commented 2 years ago

@Ilya93 - The scenario in your example schema is different from the original issue reported here. Would you open a new issue so that it gets tracked?

sundersc commented 2 years ago

Regarding the option to add roles to custom-roles.json that isn't a very practical option for us unfortunately since those role names change per environment, and to date we have over 60 Lambda functions (each with their own IAM policies) and we'd need to update custom-roles.json each time we create a new Lambda that accesses AppSync.

@danrivett - Could you please clarify on the below?

Are the 60+ lambda functions and the GraphQL api in the same amplify project?
Do you have any lambda (or other AWS resources) outside your amplify project that needs to have access to the GraphQL api which uses IAM authorization?

danrivett commented 2 years ago

@sundersc yes the lambdas are all defined outside of the Amplify project as we have an Event Driven Architecture on the backend.

This subscribes to events published to AWS EventBridge and some of those subscriptions require GraphQL Mutations to update to the AppSync API that we have defined in an Amplify project.

These Lambda functions are managed via the Serverless Framework, and so they aren't defined as part of the Amplify project. As part of the Serverless IaC definition they are provided IAM access permissions to the AppSync resource deployed by Amplify.

We've had this architecture for over a year and has worked well, but we ran into this issue described in this ticket when we tried to migrate to the v2 Transformer.

sundersc commented 2 years ago

@danrivett - How are you signing the GraphQL request from Lambda outside amplify project? Here is an example of what I'm referring to but this is for lambdas within the same amplify project.

danrivett commented 2 years ago

@sundersc we are using the aws-appsync package and the following code that we have in an internal reusable library:

import type { NormalizedCacheObject } from 'apollo-cache-inmemory';
import type { ApolloClientOptions } from 'apollo-client';
import AWSAppSyncClient, { AUTH_TYPE, AWSAppSyncClientOptions } from 'aws-appsync';
import AWS from 'aws-sdk/global';
import fetch from 'node-fetch';

const { APPSYNC_ENDPOINT, AWS_REGION } = process.env;

// The following resolves an error thrown by the underlying Apollo client:
//   Invariant Violation: fetch is not found globally and no fetcher passed
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const myGlobal: any = globalThis;
if (!myGlobal.fetch) {
  myGlobal.fetch = fetch;
}

const { credentials } = AWS.config;

if (!credentials) {
  throw new Error('No AWS.config.credentials is available; this is required.');
}

const appSyncConfig: AWSAppSyncClientOptions = {
  url: APPSYNC_ENDPOINT,
  region: AWS_REGION,
  auth: {
    type: AUTH_TYPE.AWS_IAM,
    credentials,
  },
  disableOffline: true,
};

const apolloClientOptions: Partial<ApolloClientOptions<NormalizedCacheObject>> = {
  defaultOptions: {
    query: {
      fetchPolicy: 'network-only', // important to make sure we get up-to-date results
      errorPolicy: 'all', // Helps log out errors returned from the AppSync GraphQL server
    },
  },
};

export const appSyncClient = new AWSAppSyncClient(appSyncConfig, apolloClientOptions);

This makes the AppSync interaction from Lambda very simple as it just needs to issue appSyncClient.query() or appSyncClient.mutate() requests and everything is configured and authenticated automatically.

We would rather not use the heavy-weight aws-appsync package, but the DX of using it is much simpler, as the above just works because the credentials field is populated on the AWS.config automatically by AWS when invoking the Lambda.

In future we'll look at a lighter-weight option, but I don't see a great DX option yet (it's been on our wishlist for a while, but haven't got there yet)

danrivett commented 2 years ago

Thinking about possible solutions a little bit more, in case it's helpful, I thought of a couple of possibilities:

This is based on looking at the amplify-graphql-auth-transformer source code here.

I see a custom AuthStrategy listed as an allowed value. I'm not sure if it's currently used when iam is set as the AuthProvider, but if not, potentially we could specify something like:

@auth(
  rules: [
    { allow: custom, provider: iam } # Allow CRUD access from the backend
  ]
)

Specifying that would mean this particular iamCheck() function would not be invoked by mutation resolver generators.

Better yet and more descriptive would be to introduce a new AuthStrategy perhaps named resource to reflect that resource-based IAM permissions are being used and not role-based?

@auth(
  rules: [
    { allow: resource, provider: iam } # Allow CRUD access from the backend
  ]
)

sundersc commented 2 years ago

@danrivett - Thanks for the details. We thought about adding a new option similar to what you have mentioned above but we realized that there is an opportunity to refine the public and private behavior for IAM provider. We will have more details in the coming weeks.

However I just realized that there is an escape hatch which may solve the problem in your scenario. Set the adminRoleNames in custom-roles.json as shown below. This will make sure that the VTL allow access to all the Lambda execution roles for the given accountId.

{
  "adminRoleNames": ["arn:aws:sts::<AccountIdHere>:assumed-role"]
}

sundersc commented 2 years ago

Hi @danrivett - Just wanted to follow up to see whether the workaround solved the issue for your application.

danrivett commented 2 years ago

Hi @sundersc. Sorry for not replying. Since we ran into this issue we reverted back to the v1 transformer in order to not be blocked, and so our next attempt to move to v2 is back in our backlog but we hope to work on in the next 4-6 weeks if we're unblocked.

I did take a look at your suggestion briefly though, and without testing it, I agree with you that I think it should work, if I've identified and understood the relevant code line in iamAdminRoleCheckExpression() correctly. Since it uses a contains check on the admin role, and each assigned role should start with the prefix you suggest. Thank you for that.

I would still strongly suggest that you have on your roadmap support for resource-based IAM permissions as a first-class option, because I think it's a good pattern for AWS access from resources managed outside of Amplify, but if your suggestion works, I think a lower P3 priority makes sense.

Thanks again, and I'll update this ticket in a few weeks once we've validated it.

DivonC commented 2 years ago

Hi @sundersc and everyone else experiencing this issue. Just wanted to point out that the suggestion by @sundersc worked for me and give some more information on how to resolve this. Navigate to amplify/backend/api//custom-roles.json. (Create the custom-roles.json file if it doesn't exist). Then add the following as @sundersc mentioned.

{
  "adminRoleNames": ["arn:aws:sts::<AccountIdHere>:assumed-role"]
}

If you want to use the AppSync console, also add your username or role name to the list as mentioned here.

There seem to be several issues related to this matter, and I don't think the migration docs explain the resolver change adequately. I think the docs should explain that models that use the IAM authorization strategy may deny access to lambda functions that exist outside of the amplify project if the function uses resource-based policies to access the API. This is because these models now perform a check to ensure that either

the role accessing the API is the same authRole created in the amplify project,
the role has been given permission to the API using the Amplify CLI (for example, by using amplify update function to provide a function access to the API), or
the role has been added to the custom-roles.json file as described above.

Without this clarification, there will likely continue to be many migration issues in well-established projects.

przemekblasiak commented 2 years ago

What solved it for me was adding my Lambda's role name to custom-roles.json per @sundersc 's workaround suggestion. Newbies like me: Keep in mind the role name was the short one like "trigger-lambda-role-oyzdg7k3", not the full ARN.

Extra notes: In my case, I wanted a single Lambda to be able to use the GraphQL API to update data in my Amplify project, while not being a part of the Amplify setup. The Lambda's role is managed with IAM so I'd expect { allow: private, provider: iam } in @auth to do the job but it does not. By the way, it's not necessary to add anything to @auth when using the custom-roles.json workaround.

Pickleboyonline commented 2 years ago

Seems like Amplify has a bug that causes $adminRoles to use the wrong environment's lambda's ARNs. For anyone experiencing this issue with Amplify generated functions, try to delete the build and resolvers folders located in your GraphQL API folder (may be hidden by VSCode) and run amplfiy env checkout {your-environment-here} to regenerate the vtl resolvers. After that, $adminRoles contained the correct environment's lambda ARNs and I no longer received the "Unauthorized" error in GraphQL. Note: I do not have the build or resolvers folder tracked in my git repo.

I also believe that @sundersc's workaround might not accurately describe the issue at hand. From my interpretation of the custom-roles.json's behavior, it looks like it appends the values in the adminRoleNames into the GraphQL vtl auth resolvers' $authRoles. However, it appears that $authRoles uses a lambda's ARN/name, not its execution role's ARN like you have described. @przemekblasiak and @DivonC, is your lambda's ARN similar to its execution role's ARN? Perhaps that's why it worked for you.

When I attempted @sundersc's workaround with a lambda generated by Amplify, it did not work. I believe it's because amplify generates lambda IAM execution role names that differ from lambda's name. (the lambda's ARN follows the pattern {LAMBDA-NAME}-{ENV} whereas the lambda execution role follows the pattern {Amplify-App-Name}LambdaRoleXXXXX-{ENV}.

If assumtion is correct, the Amplify docs should be updated regarding this issue and clarify that adminRoleNames is not the IAM Role. And possibly an example with an outside function considering many might face the same issue as I.

Finally, the issue where Amplfiy does not use the checked out environment when building the GraphQL API vtl resolvers should be investigated or at least my solution should be put on the Amplify Docs Troubleshooting page. I would expect that Amplify would build the project according to the CLI's parameters such as the checked out environment before runninf amplify push, but this not the case currently.

przemekblasiak commented 2 years ago

[…] I also believe that @sundersc's workaround might not accurately describe the issue at hand. From my interpretation of the custom-roles.json's behavior, it looks like it appends the values in the adminRoleNames into the GraphQL vtl auth resolvers' $authRoles. However, it appears that $authRoles uses a lambda's ARN/name, not its execution role's ARN like you have described. @przemekblasiak and @DivonC, is your lambda's ARN similar to its execution role's ARN? Perhaps that's why it worked for you. […]

@Pickleboyonline In my case, the lambda's ARN is different than the execution role's ARN and name. I've provided the role's name in the custom-roles.json file.

hisham commented 2 years ago

+1 - also ran into this when upgrading my project. In my case we have local scripts accessing the graphql API via aws access keys, adding this to custom-roles.json resolved the issue:

{
  "adminRoleNames": ["arn:aws:iam::<AccountId>:user"]
}

vincent38wargnier commented 1 year ago

Hi, I had the same issue in transformer v1, and now I have it with transformer v2 too. I removed, then amplify pushed, and recreated the table and it worked. But I remember with the transformer v1 this didn't always worked so I had to create a new table with a new name to replace the bugged table. :/ I guess a good solution would be to remove manually all the elements left about a table, because apparently amplify doesn't always remove everything, so if you know how to do let me know !

DanieleMoschiniMac commented 1 year ago

Hi, i'm waiting for updates, this problem makes me crazy

sundersc commented 1 year ago

@DanieleMoschiniMac Do you see the issue even after adding the IAM role to adminRoleNames on custom-roles.json file as mentioned here?

DanieleMoschiniMac commented 1 year ago

It worked, but the problem still there

hoan-pham-duy commented 1 year ago

Got the same problem,

accessing

Hi, I got the same problem, but after add: { "adminRoleNames": ["arn:aws:iam:::user"] } I can't run query at both console and CLI. Can you check that it stills work?

Lekha30 commented 1 year ago

here

Is there way to do this with AWS-sdk V3?

renebrandel commented 11 months ago

Hi folks - just to close the loop on this. We've since added additional documentation on how to enable access to the GraphQL API from a Lambda function or the AWS AppSync:

For future, we've also now brought the Amplify GraphQL API to CDK as a construct, in which you can explicitly pass in the Lambda execution to programmatically ensure access stays "in sync" between different environments.

samueldominguez commented 10 months ago

Using CDK v3 and I tried adding custom-roles.json to no avail. Turns out the issue was not adding a /* to the ARN, that seems to fix it:

lambdaAppSyncRole.addToPolicy(new PolicyStatement({
      actions: ['appsync:GraphQL'],
      resources: [`${config.APPSYNC_API_ARN}/*`],
}));

stemwinder commented 2 months ago

@danrivett - Thanks for the details. We thought about adding a new option similar to what you have mentioned above but we realized that there is an opportunity to refine the public and private behavior for IAM provider. We will have more details in the coming weeks.

However I just realized that there is an escape hatch which may solve the problem in your scenario. Set the adminRoleNames in custom-roles.json as shown below. This will make sure that the VTL allow access to all the Lambda execution roles for the given accountId.
{
  "adminRoleNames": ["arn:aws:sts::<AccountIdHere>:assumed-role"]
}

Is there any strategy to add custom roles per environment? We have dev and prod environments in separate accounts.

stemwinder commented 2 months ago

@danrivett - Thanks for the details. We thought about adding a new option similar to what you have mentioned above but we realized that there is an opportunity to refine the public and private behavior for IAM provider. We will have more details in the coming weeks. However I just realized that there is an escape hatch which may solve the problem in your scenario. Set the adminRoleNames in custom-roles.json as shown below. This will make sure that the VTL allow access to all the Lambda execution roles for the given accountId.
{
  "adminRoleNames": ["arn:aws:sts::<AccountIdHere>:assumed-role"]
}
Is there any strategy to add custom roles per environment? We have dev and prod environments in separate accounts.

Update: The VTL resolver context appears to include the current AWS Account ID. Initial testing with arn:aws:sts::${context.identity.accountId}:assumed-role appears to work as expected.

sundersc commented 2 months ago

With Amplify Gen2, you can use IAM authorization using the below snippet. Your API access will be completely controlled by IAM, no additional configuration is needed.

// amplify/data/resources.ts
import { defineData } from '@aws-amplify/backend';
import { schema } from './schema';

export const data = defineData({
  schema,
  authorizationModes: {
    defaultAuthorizationMode: 'iam',    // <--- IAM Authorization mode
  },
});

aws-amplify / amplify-category-api