aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
89 stars 77 forks source link

How to on delete cascade? #273

Open sakhmedbayev opened 4 years ago

sakhmedbayev commented 4 years ago

Which Category is your question related to? API, database

I have one @model type that utilizing several many-to-many connections via a joining model as, for instance, described here.

Now, when I delete a record that belongs to that @model (e.g., Post in the linked example), I have to "manually" delete all records in the joining model. It is very cumbersome as there no way to batch delete them (at least as far I know).

I wonder if there is a way to make my life easier :-) What would be a recommended way to handle it?

akshbhu commented 4 years ago

Hi @sakhmedbayev

Currently we don't support cascading delete.

sakhmedbayev commented 4 years ago

Hi @sakhmedbayev

Currently we don't support cascading delete.

Why not? :-) I think it would be a great feature to add to amplify stack

akshbhu commented 4 years ago

Hi @sakhmedbayev

I have added this as enhancement, once this is prioritized we will work on this. Feel free to 👍 so this gets more visibility. Also you can open a PR for your use case and discuss with the team.

rudyhadoux commented 4 years ago

You can use on-delete subscriptions in front-end. And do things.

RossWilliams commented 4 years ago

Cascade delete would be much better implemented with a DynamoDB stream lambda. If there is not a new item in the event, you know it has been deleted and can delete related models. Client-side would not be robust.

rudyhadoux commented 4 years ago

An automated back-end solution is better yes.

ivankokus commented 4 years ago

Any modern database has cascade delete, so having one should be a priority.

dragosiordachioaia commented 3 years ago

This would indeed be cool to have. I ended up doing this myself with a Lambda subscribed to a DynamoDB stream, where I semi-replicate the relationships between records (the ones defined in the schema with @connection), so that whenever a record gets deleted, all the connected records which should be removed go with it. The Lambda interacts with DynamoDB directly, which makes it really fast. The only painful part about it is that I have to update that Lambda every time these relationships change in the schema.

evanmcd commented 3 years ago

@dragosiordachioaia if that's something you could share, I should would appreciate seeing what that looks like.

loganpowell commented 3 years ago

I'm doing this client-side at the moment. I just worry that - given enough time - someone will be half-way through cleaning up a n:n relation and they will lose connectivity. I know that this will then cause No nulls errors when I then query them via graphql

lseemann commented 3 years ago

Like @dragosiordachioaia I ended using a Lambda for this. It works pretty well.

Here's how I've recently set it up for us.

We have a Company type, that has several many-to-many relationships to various entities. For example, each company may have many attorneys. AttorneyAssignment is the bridge between Company and Attorney.

I've created a Lambda called dynamoTrigger. It’s configured to respond to changes in several Dynamo tables, including Company.

In dynamoTrigger/src/index.js:

    const records = event.Records.map((record) => ({
      new: AWS.DynamoDB.Converter.unmarshall(record.dynamodb.NewImage),
      old: AWS.DynamoDB.Converter.unmarshall(record.dynamodb.OldImage),
    }));

    await cleanupConnections(records);

Elsewhere, cleanupConnections is defined as:

async function cleanupConnections(records) {
  const companyIDs = records
    .filter((record) => record.old.__typename === 'Company')
    .filter((record) => !record.new.id)
    .map((record) => record.old)
    .map((company) => company.id);

  await Promise.all([
    // For each company, find all their remaining connections and delete them.
    ...companyIDs.map(async (companyID) => {
      let companyResponse;
      try {
        companyResponse = await gqlListCompanyConnections(companyID);
      } catch (e) { console.log(e); }

      if (companyResponse) {
        const attorneys = companyResponse.listAttorneyAssignments.items || [];
        const paralegals = companyResponse.listParalegalAssignments.items || [];

        await Promise.all([
          ...attorneys.map(async (employer) => gqlDeleteItem(
            'AttorneyAssignment', employer.id,
          )),
          ...paralegals.map(async (employer) => gqlDeleteItem(
            'ParalegalAssignment', employer.id,
          )),
        ]);
      }

      return true;
    }),

  ]);
}

This relies on a few custom queries.

First, gqlListCompanyConnections finds all the related records for the Company that is being deleted.

  const query = /* GraphQL */ `
    query ListCompanyConnections($companyID: ID = "") {
      listAttorneyAssignments(filter: { companyID: { eq: $companyID } }) {
        items {
          id
        }
      }
      listParalegalAssignments(filter: { companyID: { eq: $companyID } }) {
        items {
          id
        }
      }
    }
  `;

Then I have a helper function to delete an item of any type

async function gqlDeleteItem(type, id) {
  const query = /* GraphQL */ `
    mutation DeleteItem($id: ID = "") {
      delete${type}(input: { id: $id }) {
        id
      }
    }
  `;

  const variables = {
    id,
  };

  const operationName = 'DeleteItem';

  const rs = await callGraphQL(query, operationName, variables);

  return rs;
}

As @dragosiordachioaia notes, it’s a pain to stay on top of changing relationships. The really painful part is that there's currently no way to use the CLI to change which models should Trigger a Lambda. They can only be set at the creation of a Lambda.* When I've needed to add a new model, I've resorted to the following steps:

  1. Copy my Lambda’s src to a temp directory
  2. Delete my Lambda
  3. Re-create my Lambda with the CLI, carefully selecting all the old configuration and including my new model as one of the triggers
  4. Restore the src I'd stashed in Step 1..
loganpowell commented 3 years ago

@lseemann do you think it would be advisable to just make a lambda resolver that does this?

lseemann commented 3 years ago

Possibly? I confess that resolvers are a part of Amplify I'm not yet adept in, but I'd love to know more about what you have in mind. All I know is that doing it in the client should probably be avoided, for the reason you describe but also because what happens if a record somehow gets deleted outside of the client, such as through the graphQL browser or even directly in Dynamo?

loganpowell commented 3 years ago

I'm working on this atm and going with the DynamoDB stream trigger as per your example. Quick question: are you also using a trigger to create your AttorneyAssignments upon the "INSERT" event?

loganpowell commented 3 years ago

Haha, I spent hours trying to use a DynamoDB lambda to clean up the edge connections only to realize that the lambda is triggered after the deletion event and - therefore - I cannot query the edge connections by that ID any more :sweat_smile:

I ended up just using graphql aliases to do this, like so:


const linkDelete = async ({ id }, authMode?: GRAPHQL_AUTH_MODE) => {
    const { data: { getEdge } } = await API.graphql({
        query: queries.getEdge,
        variables: { id },
        authMode,
    })
    if (!getEdge) {
        console.warn("No Edge found with this id:", id)
        return
    }
    const { nodes: { items } } = getEdge
    if (!items.length) {
        console.warn("No items found for this Edge:", id)
        return
    }
    const [ from, to ] = items.map(({ id }) => id)
    const mutation = /* GraphQL */ `
        mutation {
            edge: deleteEdge(input: { 
                id: "${id}"
            }) { id }
            edgeNodeFrom: deleteEdgeNode(input: { 
                id: "${from}"
            }) { id } 
            edgeNodeTo: deleteEdgeNode(input: {
                 id: "${to}"
            }) { id }
        }
    `

    const results = await CRUD({
        query: mutation,
        variables: {},
        authMode,
    })

    return results
}
lseemann commented 3 years ago

Haha, I spent hours trying to use a DynamoDB lambda to clean up the edge connections only to realize that the lambda is triggered after the deletion event and - therefore - I cannot query the edge connections by that ID any more 😅

Oh, man, I should have pointed that out. I think I lost the same hours before I made the same realization.

I think your graphQL chops are a little beyond mind, but I like your thinking. Thanks for sharing. Am going to study it a bit more to understand it better.

In my example, since my nodes don't exist any more, I'm using ListCompanyConnections to find the vestigial edges. It looks like you're doing the same thing, but in a less manual fashion?

lseemann commented 3 years ago

I'm working on this atm and going with the DynamoDB stream trigger as per your example. Quick question: are you also using a trigger to create your AttorneyAssignments upon the "INSERT" event?

No, they're being created manually. A Company and an Attorney are created independently, and then a graphQL mutation in the client creates an AttorneyAssignment to link them as needed.

loganpowell commented 3 years ago

Gotcha, yep, I think it accomplishes the same thing as the concern was a half-fulfilled mutation, but with the aliases, the request is sent in the same AppSync API call (and is thus all handled server side). There may be issues on the server side, but I'm going to overlook those and hope that AWS doesn't fail me.

loganpowell commented 3 years ago

btw, if you have an unknown number of connections, you can extrapolate the example above by concatenating template strings that have an alias incremented by the index. E.g., instead of edgeNodeFrom:, use something like edgeNode${index}:

evankirkiles commented 2 years ago

I needed cascade deletion in a project of mine and initially set out with the DynamoDB Lambda Trigger approach. But I had some concerns which ultimately led to my choosing a different method:

  1. The Lambda trigger is invoked on all update events to the table, not just on REMOVE events. That's a lot of extra, unnecessary invocations.
  2. Scoping the GraphQL requests in the Lambda trigger to respect @auth model directives without direct access to the Cognito user calling the delete mutation seemed super difficult. Just using an all-access Lambda IAM Execution role is very heavy-handed in my opinion and could lead to security issues with more complicated relationships / auth rules.
  3. Most importantly for me–I'm using rtk-query to consolidate and cache Amplify API requests. This requires being able to correctly invalidate cached requests for affected entries. The Lambda DynamoDB trigger approach doesn't allow returning back to the client the cascade-deleted entries––in fact, it doesn't allow returning back to the client anything. So to invalidate the cache I'd have to predict the cascade-deleted entries with a deeply nested query beforehand. No good.

So instead of using a Lambda trigger based on DynamoDB updates, I decided to just run the cascade deletion from a serverless express Lambda function behind an endpoint in a REST API with authorization based on the Cognito user pool configured for my AWS Amplify app. By accessing the Authorization header in the request that invokes the Lambda function connected to the endpoint, I have the user's ID JWT token and can funnel it through to GraphQL requests in the Lambda to assume that user's identity when running queries and mutations.

This solves pretty much all of the problems I had above:

  1. We now only begin the cascade deletion when a user explicitly makes a request to the REST API. So we have very clear and limited invocation conditions for the Lambda trigger.
  2. As the GraphQL requests are signed as if they were the authenticated Cognito user who invoked the Lambda function, we don't have to worry about complex IAM roles on the Lambda function.
  3. The response takes a bit longer than a normal delete on the GraphQL API, but we now prevent any race conditions that could potentially re-query incomplete data in the middle of the cascade deletion. Furthermore, the Lambda function returns arrays of the ids of all affected entries, so invalidating the cache is trivial.

Implementation of the above is fairly straightforward and within the bounds of documented AWS Amplify, except for configuring the REST API's Cognito authorizer, which requires some CDK overriding. I wrote a couple of posts outlining the process below:

[Part 1] - Building an identity-assuming GraphQL client in a Lambda layer [Part 2] - Building the cascade deletion serverless express Lambda function [Part 3] - Building a REST API for the Lambda functions and accessing the endpoints from the client

hisham commented 2 years ago

+1. Related to https://github.com/aws-amplify/amplify-category-api/issues/623

ChristopherGabba commented 1 month ago

Big +1 here still -- looking for this with Amplify Gen 2. I would love it if you flagged something as required like the below, that automatically tells the backend to auto-delete the entire object if that value becomes invalid.

Alternatively, you could create a new tag called autoDeleteParentUponInvalidation.

  Friendship: a
    .model({
      id: a.id().required(),
      receiverId: a.id().required(), // option #1, making it required forces the object to delete when it is invalid
      receiver: a.belongsTo("User", "receiverId"),
      senderId: a.id().required().autoDeleteParentUponInvalidation() // option #2, tell the backend to do this directly,
      sender: a.belongsTo("User", "senderId"),
      status: a.ref("FriendStatus").required(),
      owners: a.string().array(),
    })

This would literally save me hundreds of lines of code in my current project, with how many queries and deletes I have to run prior to deleting a User object.