aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
89 stars 77 forks source link

Amplify / AppSync with graphql and Cognito - Audit User Actions #404

Open malcomm opened 5 years ago

malcomm commented 5 years ago

Which Category is your question related to? Amplify-cli / Appsync

What AWS Services are you utilizing? api / graphql, hosting, DynamoDB, cognito

Provide additional details e.g. code snippets

I asked a similar question here:

https://forums.aws.amazon.com/thread.jspa?threadID=305061&tstart=0

But, i feel like I'm not really getting a great answer specific to my application's setup. I'm using amplify/appsync and trying to figure out a good way to handle this generically. I have a handful of resources that I need to audit.

I am trying to find an effective way to track changes performed using AWS AppSync. By audit or track, what I mean by that is:

I have a need to track what a user does in the system. So at a minimum:

who (username) what (mutation) when where (source IP or other info) why (this is more than likely gonna be entered by the user)

I'm sure there is more, but if I can get that that would be huge.

The suggestion is to add pipeline resolvers ... but I'm having trouble figuring out how to manage that with my schema.graphql. Some questions:

Do I just add files to my resolvers & stacks? How do I just add my pipeline and still utilize the standard resolvers? Is there a way to write a generic pipeline resolver that will handle all mutations in the system?

I have a vague idea of how to implement this (documentation is not clear), but I'm also thinking that this is the wrong tool. It feels like this kind of thing should be audited at a different level.

Any help is greatly appreciated.

kaustavghosh06 commented 5 years ago

@malcomm You can attach a lambda function to a DynamoDB stream and perform your event-based business logic in that Lamdba function. The api category exports the stream arn of every table created by a @model directive which you can use to subscribe to changes on those tables.

You could use pipeline functions as well to execute the publish logic from within AppSync but until pipeline functions are fully supported via the api category, this would require custom resources(resolvers and stacks).

malcomm commented 5 years ago

@kaustavghosh06 - thanks for the suggestion. I've added a simple DynamoDB trigger that hooks up to a Lambda function I added. This looks great; however, there's one key thing missing: the user. I was thinking that the user identity for the mutation would be on the context, but I'm only seeing this:

2019-07-26T00:28:46.244Z    a4847658-23f8-48d9-9aea-2160636490d8    Context: { callbackWaitsForEmptyEventLoop: [Getter/Setter],
done: [Function: done],
succeed: [Function: succeed],
fail: [Function: fail],
logGroupName: '/aws/lambda/LogTableChange',
logStreamName: '2019/07/26/[$LATEST]279a5cd878594842ba6a4b7b70d1e13b',
functionName: 'LogTableChange',
memoryLimitInMB: '128',
functionVersion: '$LATEST',
getRemainingTimeInMillis: [Function: getRemainingTimeInMillis],
invokeid: 'a4847658-23f8-48d9-9aea-2160636490d8',
awsRequestId: 'a4847658-23f8-48d9-9aea-2160636490d8',
invokedFunctionArn: 'arn:aws:lambda:us-west-2:626912780862:function:LogTableChange' }

And I'm not seeing the user identity on the event object either.

I see Lambda documentation that indicates that this method can be used to provide a to create a permanent audit trail of write activity in your table.:

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html

How can this be a full audit trail if I don't log the username and the source (the identity)?

Am I doing something wrong? Do I need to enable something to get the identity information over to Lambda?

malcomm commented 5 years ago

Any updates on this? This is kinda a game changer for me, because I need the ability to audit a user's action.

At this point, I think I'm just gonna have to roll my own.

jkeys-ecg-nmsu commented 5 years ago

@malcomm off the top of my head, context is the wrong place to look. That's your Node environment context, not the context of your event. You want to look in the first position argument, the event object.

When you say identity do you need the sub or the cognito:username attribute?

malcomm commented 5 years ago

@jkeys-ecg-nmsu - I've looked in both the context and event and I'm not finding anything. The sub would be great but the cognito:username is a bare minimum. At this point I just need something to identify the user (IP address, cognito:username, mac address ... etc.)

Thanks.

malcomm commented 5 years ago

Any help on this? I'm really blocked at this point and it is going to impact a production date.

It looks like there's no easy way to get the user identity information using a DynamoDB trigger. Can I add a hook somewhere that will basically wrap the normal mutation call but send the logs to somewhere where they can be viewed?

kaustavghosh06 commented 5 years ago

@malcomm Are you using the GraphQL transformer to spin up your AppSync APII? If yes, the transformer automatically adds an owner field to your model which can help you track the user.

malcomm commented 5 years ago

@kaustavghosh06 - Yes I am using the GraphQL transformer to manage the AppSync API. When you say automatically, that sounds very nice, but I don't see an owner field anywhere in the data (logs or table).

Amplify version:

> amplify --version
1.7.6

I can't use the latest because of the aws-amplify/amplify-cli#922. Not sure if this is even related to version of amplify-cli.

Do I have to do anything special to get this owner field to show up?

kaustavghosh06 commented 5 years ago

@malcomm I take that back. You can have a mutation and add an owner field from your client app and have auth rules for that field/model. You can define something like the following.

mutation CreateDraft {
    createDraft(input: { title: "A new draft" }) {
        id
        title
        owner
    }
}
{
    "data": {
        "createDraft": {
            "id": "...",
            "title": "A new draft",
            "owner": "someuser@my-domain.com"
        }
    }
}
malcomm commented 5 years ago

@kaustavghosh06 - Well ... I could add a whole identity field and add in all kinds of info in there, but ... doesn't that rely on the client to set the correct thing? From what I'm thinking that would be a client only solution and that is very prone to error and can be modified by a malicious attacker.

I guess what I'm saying is that I think I need a server side solution that is secure and happens automatically. If we could add an @identity on our models and it would automatically put in the identity information (username, IP address, maybe session info ... etc.) into a field, that would be great. That way, when I get the event on my Lambda trigger, all of that would be there.

Another way, would be to have this information automatically forwarded and placed on the event to be consumed downstream.

I'm open to whatever works.

Thank you.

kaustavghosh06 commented 5 years ago

@malcomm How does your schema look like? And for the function you've mentioned out here - aws-amplify/amplify-category-api#404, that's the trigger function right and not a @function resolver, correct?

malcomm commented 5 years ago

@kaustavghosh06 - Correct, that information is coming from the DynamoDB Lambda (via the trigger).

Here's a section of my schema:

type StudyEncounter
  @model
  @auth(rules: [
    { allow: groups, groups: ["admin"] },
    { allow: groups, groupsField: "groupsCanAccess" }
  ])
  @key(name: "StudyEncounterSubjectId", fields: ["studyEncounterSubjectId"], queryField: "encounterSubjectId")
{
  id: ID!,
  studyEncounterSubjectId: ID!,
  ...
  groupsCanAccess: [String]
}

I currently do not have an owner or a field for identity.

kaustavghosh06 commented 5 years ago

@malcomm Did you checkout the context object available in your in the resolver? You can modify your auto-generated resolver to add the user information to DDB. You can find reference for the context object available in a resolver out here - https://docs.aws.amazon.com/appsync/latest/devguide/resolver-context-reference.html

The context object in the resolver has the following information:

{
    "sub" : "uuid",
    "issuer" : "string",
    "username" : "string"
    "claims" : { ... },
    "sourceIp" : ["x.x.x.x"],
    "defaultAuthStrategy" : "string"
}
malcomm commented 5 years ago

@kaustavghosh06 - If we're talking about the Lambda function that's configured on the DynamoDB table, I looked at the event and context object and neither had user identity.

But something you wrote got me thinking. In my schema, could I define an identity field and use an @function to grab the user's identity info?

kaustavghosh06 commented 5 years ago

No, I’m talking about the actual VTL resolver and the context object available to it.

malcomm commented 5 years ago

I don't have a custom VTL resolver for this. Honestly, it's unclear to how to add one that works with the framework. Any help on that?

kaustavghosh06 commented 5 years ago

The GraphQL transformer auto-generates the resolvers for you based on hyour schema, but if you want something custom -like your use-case, you can override the auto-generated resolvers located in youramplify/backend/api/<api-name>/build/resolvers directory. Refer to this documentation - https://aws-amplify.github.io/docs/cli-toolchain/graphql#custom-resolvers for learning more about using/implementing custom resolvers.

kaustavghosh06 commented 5 years ago

Also, overwriting your auto-generated resolver according to this doc - https://aws-amplify.github.io/docs/cli-toolchain/graphql#overwriting-resolvers should help in your case.

malcomm commented 5 years ago

@kaustavghosh06 - OK so just to be sure I'm doing this right. For my resource StudyEncounter, I would need to add a new field for the user or identity and then I would need to put the following two files into amplify/backend/api/<api-name>/resolvers:

  1. Mutation.createStudyEncounter.req.vtl
  2. Mutation.updateStudyEncounter.req.vtl

I am assuming that I copy both of those files from amplify/backend/api/<api-name>/build/resolvers and modify accordingly? Basically set the identity field to what I want?

kaustavghosh06 commented 5 years ago

Correct.

malcomm commented 5 years ago

@kaustavghosh06 - OK so I'm no expert at VTL, I admit that (first time with it actually). But I'm rather confused by the results I'm getting. I copied over Mutation.updateStudyEncounter.req.vtl and I'm trying to figure out where to set the my new field that is called identity. I look at the section where updatedAt and __typename are being set and I'm thinking that's a good place to start. So I do something like this:

...
## Automatically set the updatedAt timestamp. **
$util.qr($context.args.input.put("updatedAt", $util.defaultIfNull($ctx.args.input.updatedAt, $util.time.nowISO8601())))
$util.qr($context.args.input.put("__typename", "StudyEncounter"))
$util.qr($context.args.input.put("identity", $ctx.identity))
...

No matter what I do, identity ends up being NULL. I tried $context.identity and even the entire $ctx ... all NULL.

Am I just doing this wrong? Why is $ctx NULL?

And just to be sure, my custom resolver is being utilized, because I'm getting this error:

ERROR Error: Uncaught (in promise): Object: {"data":{"updateStudyEncounter":null},"errors":[{"path":["updateStudyEncounter"],"data":null,"errorType":"MappingTemplate","errorInfo":null,"locations":[{"line":2,"column":3,"sourceName":null}],"message":"Expected JSON object for attribute value '$[update][expressionValues][:identity]' but got 'NULL' instead."}]}
    at resolvePromise (zone.js:852)
    at zone.js:762
    at rejected (tslib.es6.js:69)
    at ZoneDelegate.push../node_modules/zone.js/dist/zone.js.ZoneDelegate.invoke (zone.js:391)
    at Object.onInvoke (core.js:26769)
    at ZoneDelegate.push../node_modules/zone.js/dist/zone.js.ZoneDelegate.invoke (zone.js:390)
    at Zone.push../node_modules/zone.js/dist/zone.js.Zone.run (zone.js:150)
    at zone.js:910
    at ZoneDelegate.push../node_modules/zone.js/dist/zone.js.ZoneDelegate.invokeTask (zone.js:423)
    at Object.onInvokeTask (core.js:26760)
malcomm commented 5 years ago

OK I think I've got it ... my editor was clobbering end parentheses ... basically I was missing a ")" and the error message looked like I was just getting NULLs.

kaustavghosh06 commented 5 years ago

@malcomm Glad you had that figured out. Did you get all the idenity info that you needed?

malcomm commented 5 years ago

Also of note ... I had to do this:

$util.qr($context.args.input.put("identity", $util.toJson($ctx.identity)))

Without the: $util.toJson Call ... nothing works. I guess the "put" is only able to handle a String.

Also, I was trying to store this data as an AWSJSON. I was probably doing something wrong, but the documentation is not great and I could not get that to work very well at all. I tried the $util.toJson and things got strange when pulled out of the DB. Also, I tried this:

$util.qr($context.args.input.put("identity", $util.dynamodb.toDynamoDBJson($ctx.identity)))

That came back null ... anyway, I think after many, many hours I finally have the identity being stored on a single table .... so very painful. Honestly ... this is just something that should be handled, but yeah ....

kaustavghosh06 commented 5 years ago

@malcomm If you protect your model with an auth role with an owner autorization, we auto-populate the table with user info, but since you didn't have it, that's why you'd to use custom resolvers and deal with VTL. We're soon releasing ocal testing for your AppSync API's and resolvers as a part of Amlify CLI to make it easy to debug your API's - including your VTL resolver code.

malcomm commented 5 years ago

@kaustavghosh06 - I got the data ... this is far from ideal, but it might work for my solution.

My 2 cents: something needs to change here to help people out. Trying to audit a user's actions should be very very easy. What I have now is a hacked up bandaid that might work ... I mean taking a step back from all this, I very much doubt I am the only one that needs to be able to audit a user's actions. I would really like to see something first-level to support this.

mikeparisstuff commented 5 years ago

@malcomm You are always able to use AWS CloudTrail to audit API calls made against your AWS account.

Otherwise, we are working towards making it easier to add pipeline functionality to API projects that would enable use cases like this. In the future the goal is to be able to create a function named "Audit" and then make it easy to compose that function into any mutation that you want to audit. Do you agree that generalized support for pipeline functions would help in this situation?

Also to clear any confusion about the VTL mentioned above here is some more explanation. This is a simplified version of the default createX mutation.

## We can add identity information by setting the key in the input **
## This works because we call $util.dynamodb.toMapValuesJson($ctx.args.input) below. **
## This will store all information contained in the JWT as a Map  **
## in a single DynamoDB attribute named identity **
$util.qr($ctx.args.input.put("identity", $ctx.identity.claims))
{
  "version": "2017-02-28",
  "operation": "PutItem",
  "key": {
    "id": $util.dynamodb.toDynamoDBJson($util.autoId()),
  },
  "attributeValues": $util.dynamodb.toMapValuesJson($ctx.args.input),
  "condition": {
    "expression": "attribute_not_exists(#id)",
    "expressionNames": {
      "#id": "id",
    },
  },
}

Once you have added the new identity field, you can add support in the schema pretty easily as well.

type Identity {
    sub: String
    iss: String
        # etc.
}
type Post {
  id: ID!
  title: String!
  identity: Identity
}

You can then list Posts:

query listPosts {
  listPosts {
    items {
      id
      title
      identity {
        sub
      }
    }
  }
}

And get back results that look like:

      {
          "id": "7a29036a-e3df-458b-b966-fa9e4d6d5ae4",
          "title": "Hello, world!",
          "identity": {
            "sub": "c24611ed-91ba-4a63-a591-74576a346be3"
          }
        }
malcomm commented 5 years ago

@mikeparisstuff - I was looking at CloudTrail and at first it looked great, but it seems to be geared towards just auditing the administration ... not the actual use of the API. I could not find a way to make that work. Am I just missing a setting?

Just to be sure, this would need to log events (mutations in this case) to CloudTrail for users logged in via Cognito. I would be really happy if that was the case.

But honestly, CloudTrail is exactly what I am looking, it gives me all the tools that I need. If this can be used for the general use of the API ... that would be so great. +1000 for this.

mikeparisstuff commented 5 years ago

@malcomm CloudTrail would be able to tell you when calls are made to your data sources but the identity from cloudtrail will be that of the role that AppSync assumes when calling your DynamoDB table. In other words they are not specific to your logged in Cognito users as you correctly called out.

I have updated my answer above to give a working example for what you are trying to do. Hopefully this helps clarify things.

malcomm commented 5 years ago

@mikeparisstuff - technically I understand your answer ... but stepping back again ... how does this make sense? I have to say, having an audit trail (CloudTrail) lose the context of what user performed the action ... I can't understand how that makes sense.

My 2 cents: appsync should "stash" the user identity information to be used for all logging and for CloudTrail. Make this the "source identity" or something, because both are important.

mikeparisstuff commented 5 years ago

@malcomm I don't disagree with you that this is useful but this lies a bit outside of the traditional flow when using CloudTrail. CloudTrail keeps an audit log of all activities performed against your AWS resources and, in general, requests are signed with a SigV4 signature which CloudTrail uses to pull identity information out of.

I have to say, having an audit trail (CloudTrail) lose the context of what user performed the action ... I can't understand how that makes sense.

The question here is what is the context of what user performed the action.. From AppSync's perspective it is aware of the Cognito User Pool, OIDC endpoint, etc. From CloudTrail's perspective everything is a SigV4 signed call. When making a call to a data source, AppSync assumes a role in your account and is able to use that role to sign a request to send to DynamoDB on your behalf. CloudTrail is able to pick up on this and understands the "user" to be the IAM role that was used to actually sign and issue the request to AWS.

I will need to investigate if it is possible to add custom identifying information to CloudTrail as you are requesting but this would be a longer term enhancement. In the meantime, you have the ability save custom identification information such as attributes in your JWTs using resolvers in AppSync.

malcomm commented 5 years ago

@mikeparisstuff - thank you for looking into this. Not sure if this helps with the priority or not, but I'm looking at this:

https://docs.aws.amazon.com/appsync/latest/devguide/cloudtrail-logging.html

AWS AppSync is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in AWS AppSync.

Per that, I would say that this violates the contract of that documentation. I would also say, that it violates the KISS principal ... that is, from a customer of these services, I would not expect CloudTrail to not log the user that actually initiated the call.

malcomm commented 5 years ago

@kaustavghosh06 or @mikeparisstuff - I see that this got moved to a feature-request. I'm trying to plan for a production date and I'm trying to see if I need to put in place an interim fix for this or if these changes could be done before my date. Any idea on the time horizon for this feature?

malcomm commented 5 years ago

Also, I'm looking at the information that is in CloudTrail and I don't seem to be seeing the events that pertain to AppSync/DynamoDB mutations. From my Lambda that's trapping DynamoDB, I'm seeing this:

2019-08-01T21:22:01.172Z    99ead3e7-4f57-421e-8d18-77fa13312b6b    Received event:
{
    "Records": [
        {
            "eventID": "29dd1b8e3ef95319f7eedd4d8a54d3ba",
            "eventName": "MODIFY",
            "eventVersion": "1.1",
            "eventSource": "aws:dynamodb",
            "awsRegion": "us-west-2",
            "dynamodb": {

When I go over to CloudTrail, I'm not seeing this event ID (29dd1b8e3ef95319f7eedd4d8a54d3ba) anywhere. I also do not see any event related to AppSync/DynamoDB mutations at all.

I'm logged in as Administrator and using CloudTrail. I'm assuming that the admin account has access to all user information here?

malcomm commented 5 years ago

@kaustavghosh06 or @mikeparisstuff - any updates on this? Thank you

malcomm commented 5 years ago

@kaustavghosh06 / @mikeparisstuff - I know this is marked a feature-request ... but isn't this more of a bug? (because the system is not acting as it should?)

nateiler commented 5 years ago

@malcomm we're interested in doing something very similar. I'm curious to know where you ended up. My gut is telling me to wait until pipeline resolvers (aws-amplify/amplify-category-api#430) are available.

malcomm commented 5 years ago

@kaustavghosh06 / @mikeparisstuff - any updates on this?

azatoth commented 4 years ago

any further updates on this @mikeparisstuff ?

codecadwallader commented 4 years ago

+1 .. in general it seems like audit is a popular feature ask but not yet supported.

malcomm commented 4 years ago

@kaustavghosh06 / @mikeparisstuff - it's been over a year since I first submitted this and I was hoping to have any indication on whether or not this is going to get any support?

malcomm commented 3 years ago

@kaustavghosh06 / @mikeparisstuff - just putting another ping on this. Would love to know if this is going to get support or not?

jimjoes commented 3 years ago

+1

dror-laguna commented 3 years ago

+1

dror-laguna commented 3 years ago

Hi guys, We needed this feature and couldn't wait for it, so we implemented our own transformer. Firehose can be very useful in this case, and much more. It's open-source so feel free to use and contribute. https://github.com/LaugnaHealth/graphql-firehose-transformer

tgjorgoski commented 3 years ago

@dror-laguna , that is great , do you know if it will work if you are already using lambda resolvers for some fields?

dror-laguna commented 3 years ago

@dror-laguna , that is great , do you know if it will work if you are already using lambda resolvers for some fields?

hi @tgjorgoski , yes it should work with a @function resolver, we test it but we don't use it much... so i can't be sure 100%

kukodev commented 3 years ago

@malcomm did you end up implementing this on your own? I've had several attempts during last year but never managed to work around it with proper and scalable solution.

Did amplify team communicate in any way regarding this in other threads maybe?

dror-laguna commented 3 years ago

@malcomm did you end up implementing this on your own? I've had several attempts during last year but never managed to work around it with proper and scalable solution.

Did amplify team communicate in any way regarding this in other threads maybe?

@konradkukier2 we implemented and open source it aws-amplify/amplify-category-api#404

kukodev commented 3 years ago

@dror-laguna Thanks a lot! I've seen it just today and we're already planning the time to give it a try next sprints. Fingers crossed :crossed_fingers: