aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.43k stars 2.13k forks source link

DataStore/Conflict Detection/AppSync - ALLOW UNDELETE OF DELETED ITEMS (RECYCLE BIN) #6103

Open sacrampton opened 4 years ago

sacrampton commented 4 years ago

Describe the bug With conflict detection enabled, the schema.graphql that is generated automatically adds the following fields to types. It also needs to add the _ttl value to each type so that you can see when items are going to be removed from DynamoDB

_version: Int! _deleted: Boolean _lastChangedAt: AWSTimestamp!

Only the _version is added to input lists (ie: input create / input update / input delete*) where you can provide a _version. It is not added to any of the lists that allow you to list/search fields - so for example you can't list all records that are not deleted. If I have 20 records in my database and 5 have _deleted=true then all my queries will always get all 20 records until the _ttl expires and those records are removed. We need to be able to filter for _deleted = false or not exists, but at the moment these fields are not exposed.

Also, if I want to un-delete a record I need the resolver to allow me to be able to set _deleted to false and _ttl = null and at this time I can't access anything other than _version as an input.

Amplify CLI Version

4.19.0

To Reproduce Try to query for _deleted = true

Expected behavior Query for all records (list and searchable) where _deleted ne true. Update _deleted to false and delete the _ttl field for undelete

Additional context

I can implement this by modifying the schema, but that is a lot of work to build/maintain on my part and this is a pretty basic function that I think is required by everyone so I'm listing this as something that should be implemented by the Amplify team.

sacrampton commented 4 years ago

@UnleashedMind & @SwaySway - we don't seem to be making much progress with this issue the way I wrote it so I'm going to have another go at doing this.

For the purposes of this example, lets say I have a table/type called Asset. So when I call the deleteAsset resolver it creates a field called _delete = true and _ttl = int (30 days from time of deletion). Of course it also increments the _version and _lastChangedAt values.

With this record "flagged" as deleted, and held there for 30 days before the _ttl fires and deletes the record from DynamoDB, I want to be able to UNDELETE the record. In DynamoDB all I need to do is to delete the _ttl field and it won't be deleted from DynamoDB. But I want to do this through AppSync.

So the first thing I need to be able to do is to see the contents of the Recycle Bin (ie. listAssets(filter:{_deleted:{eq: true}}) } - however the listAssets takes its input from ModelAssetFilterInput and the _deleted field is not part of the ModelAssetFilterInput definition so its impossible to filter for _deleted=true.

So lets say we do know the id of the record in the "recycle bin" that we want to undelete then we should be able to set

updateAsset(input:{
   id: "8ce079c6-1db2-48c1-a5a6-e60c9d22daab"
  _version: 11
  _deleted: null
  _ttl: null
})

To be able to do this the definition of UpdateAssetInput must include _deleted and _ttl - but it doesn't. Even if I manually do this editing of the definitions I can't get it to work where I can null the _deleted & _ttl fields.

sacrampton commented 4 years ago

Hi @SwaySway & @UnleashedMind - understand you guys are busy, but I created this issue a month ago today and haven't had any feedback at all on it, so was wondering if you could take a look. Thanks

Mentioum commented 4 years ago

Also keen to know how we are supposed to filter out _deleted items. I also feel it should be added to the ModelAssetFilterInput definition.

sacrampton commented 4 years ago

If you are using @searchable it is deleting the record from ElasticSearch so any search there will return all live (non-deleted) records. But being able to find in DynamoDB the _deleted records and then undelete them (_deleted: null, _ttl: null) remains something that is a challenge

undefobj commented 4 years ago

Hi @sacrampton I took a read of this issue and I understand you're looking for capabilities to undelete items. The soft delete implemented in AppSync is for consistency guarantees and not to be controlled by the client, which is also why _lastChangedAt is not in the current input. The only reason that _version is in the update input is for the client to provide its current version, but that is still controlled in the service. When we developed this feature in AppSync it was to allow administrators to recover from erroneous deletions in collaborative applications.

To open this up for a "Recycle Bin" like feature, we'll have to investigate this further and look at the consistency implications throughout the system. For now I'm going to mark this as a feature request and measure the interest from other community members as we have a large backlog of DataStore requests. In the meantime if you need this I believe you should be able to address with a custom resolver.

undefobj commented 4 years ago

Transferring to JS repo for tracking.

sacrampton commented 4 years ago

Hi @undefobj - thanks for getting back to me on this - appreciate it.

I've been trying to do this in custom resolvers, but it fails when I try to null the _deleted/_ttl fields through AppSync. With _ttl is says it can't be modified through the client.

undefobj commented 4 years ago

Hi @undefobj - thanks for getting back to me on this - appreciate it.

I've been trying to do this in custom resolvers, but it fails when I try to null the _deleted/_ttl fields through AppSync. With _ttl is says it can't be modified through the client.

You're most likely going to need to do a regular DynamoDB resolver with a custom data source. The AppSync resolvers have constraints on what can be done from a client for good reason - mistakes can lead to data loss and consistency issues.

My .02 I'd think a little more through your business use case for such a feature. It's you're looking to recover from mistakes as an admin, I'd probably recommend a different API to interact with the records as an admin.

jocarrillo commented 4 years ago

How should we filter deleted records from list queries? I'm using appsync and the issue is ModeXXXFilterInput do not accepts the _deleted field as one of the filters...

renegoretzka commented 4 years ago

Just my 2 Cents would be that it would be just enough that Amplify or lets say the API does not spit out those deleted database entries by default. It really makes no sense to still receive them with the API call and handle them with filter (what is actually is, as @jocarrillo has said, not supported by the FilterInput, which makes handling this even worse). I would really appreciate the use of AutoMerge and still can handle filtering or even get rid of those _deleted: true entries in the response.

ramakrishnan commented 3 years ago

@undefobj Just to add an usecase which I am going through. Through DataSource I delete an document.

const [original] = await DataStore.query(this.MODEL, conditions)
return DataStore.delete(original)

This action immediately removed the document from IndexedDB, but unfortunately the backend VTL logic prohibits this delete, now how to get the document inserted back to IndexedDB.

One more usecase during create

const newDoc = DataStore.save(new this.MODEL(input))
DataStore.save(newDocument)

Ideal behaviour would be a mark the document as soft delete and stop calling mutation.

juanlucky commented 3 years ago

I encounter the same issue having the soft deleted items returned by default from a simple listing GQL query. And no way to filter them. This could have a huge impact on performance as the unwanted soft deleted items grows.

Could we have a way to avoid this soft deleting capacity if we don't want to ? Or could we be able to configure the default value for the BaseTableTTL to reduce the time these soft deleted items remains in the base table? Could not find a simple way to change this default 30 days value as the base table is auto-generated by amplify CLI.

jcbdev commented 3 years ago

Soft deletes in the base table also seem totally unnecessary when every change exists in AmplifyDataStore table already? Surely this is enough to drive the sync mechanism alone? (and I'm sure it used to because I don't remember the _deleted field previously)

The more and more I think about this the more it completely breaks AppSync as a graphql api and turns it into a offline-first only product (Datastore only). This only makes sense if you never query the api directly but always only query the local replica (which would not present the deleted items). So in my case where I have more than just a mobile app this means hand-cranking a separate api with it's own underlying data source.

for example, say you have a frequently changing table where items are deleted often. Currently none of "Input" objects for an index query contain the _deleted field in the query.

Thus if I wanted to query a index like "actionsByUser" then I would HAVE to page through all of the deleted actions before I could get to the ones I need because there is no way to filter _deleted in the autogenerated Input objects (ModelActionFilterInput). I would probably need to add _deleted to all indexes maybe? but then the composite key generation mechanism doesn't work if the field _deleted does not exist when the document is inserted into dynamodb. Thus forcing me to then add _deleted=false to all logic across the system. This is not only creating "code damage" across the system but is reducing the possibility this api could be presented to any other party other than the amplify client as consumers have ever increasing complexity to deal with. It self limits the usefulness of the amplify platform and increases the need to build other api's to wrap around and hide these implementation details. (a graphql api to wrap the api perhaps 😆 )

All of these issues are created by trying to push too many artifacts of the sync mechanism through the types and queries themselves instead of a separated eventstore type sync like the amplifydatastore table already had implemented. The cognitive load of using the api directly too high so you could not expose the api to a 3rd party for example. Another example of how this is pushing complexity and cognitive load deep into the whole platform is the @searchable example from above where you now datastore's out of sync with each other because one has deleted the item and the other has not . Another being having to retrieve the current _version first before running an update mutation or the update is rejected.

Personally I would separate the "main api" from the "sync api" automatically rather than trying to squash the two concerns together and thus always having these issues. The VTL templates could be made to TransactWriteItems to two tables to prevents the sync table and the entity table getting out of sync with each other. I'm not 100% sure what the answer would be but it just feels like the current implementation is extremely limiting the potential of the entire platform.

danrivett commented 3 years ago

I do agree that that Datastore approach is too homogeneous where it is designed to be used exclusively as the interaction mechanism for the whole AppSync service.

This causes us quite a lot of pain as of course we have backend interactions with the data that uses regular GraphQL queries and mutations and so then has to know about DataStore fields such as the _deleted flag and manually post process the GraphQL results before the backend application can use the data to avoid processing logically deleted data.

I do think this is something to genuinely reconsider @undefobj as I think it also speaks to other designs you currently have inflight such as local vs cloud models and which models are syncable or not. Both would be great to be designed to be separately configurable per client and not assume there is just one client per AppSync API.

sacrampton commented 3 years ago

Hi @danrivett @jcbdev - thought I'd take a second to share how I hacked this to work.

Firstly, for my soft delete I've added a field to all my tables/models called "isDeleted" and updated resolvers to create this as false by default and I delete in my application by updating this to true. Then after a nominated period (say 30 days) can perform the delete function with a Lambda function.

But I also created a custom schema file with all of the Model*FilterInputs and added the following

input ModelAssetFilterInput {
  _deleted: ModelBooleanInput
  _lastChangedAt: ModelIntInput
  _version: ModelIntInput 
}

This means I can query DynamoDB for those records that have neither _deleted or isDeleted set to true.

query listAssets{
  AssetPlant(
    assetPlantId:"0d6acebc-xxxx-xxxx-xxxx-6b9e32f879a8"
    filter:{
      _deleted:{ne:true}
      isDeleted: {ne:true}
    }
    limit:10)
  {items{id name _version _deleted isDeleted }}
}

Whilst it would be better if we didn't have to make a science project out of this, this at least is working for me.

fernandesGabriel commented 3 years ago

Have anyone here tried to create a DynamoDB trigger to clean up the table as the _deleted flag is set?

acusti commented 3 years ago

like others in this thread, i am querying a DataStore/AppSync API from a lambda and i want to filter out deleted items, e.g. { filter: { _deleted: { ne: true } } }. currently, trying to add that as is to my GraphQL variables results in an error: The variables input contains a field name '_deleted' that is not defined for input object type 'Model<TypeName>FilterInput'. is there any reason why _deleted: ModelBooleanInput can’t be added to the generated Model<TypeName>FilterInput inputs by default? it seems like a filter that would be useful to many.

the other simple solution i see that would be reasonable is making it possible to add _deleted: Boolean explicitly as a field to any @model types in schema.graphql, the same way i can add createdAt: AWSDateTime! and updatedAt: AWSDateTime!. as it stands currently, trying to add _deleted results in The type 'TypeName' has declared a field with a non unique name '_deleted'.

there are a lot of compelling comments in this thread that suggest ambitious and large changes to the whole way DataStore/AppSync work. however, i think to address the pain point that has been raised most often in this thread, both of the above suggestions would work and both seem likely to be changes that are safe, simple, and non-breaking.


for others with this issue and looking for a workaround, i was successfully able to use @sacrampton’s solution, though it wasn’t obvious, so below is a detailed explanation of what i did. quick note: below, i use TypeName as a placeholder for my actual type for which i want to filter out _deleted items, e.g. Post for the blog example.

  1. find and open this file: amplify/#current-cloud-backend/api/<your-api-name>/build/schema.graphql
  2. search for input ModelTypeNameFilterInput in that file and copy the entire block
  3. open amplify/backend/api/<your-api-name>/schema.graphql and paste that into the bottom of the file (or wherever you want it)
  4. add _deleted: ModelBooleanInput to that input block
  5. paste in all of the other Model<Type>Input inputs and enums that are referenced in the ModelTypeNameFilterInput. here’s what i had to add:
    
    input ModelStringInput {
    ne: String
    eq: String
    le: String
    lt: String
    ge: String
    gt: String
    contains: String
    notContains: String
    between: [String]
    beginsWith: String
    attributeExists: Boolean
    attributeType: ModelAttributeTypes
    size: ModelSizeInput
    }

input ModelIDInput { ne: ID eq: ID le: ID lt: ID ge: ID gt: ID contains: ID notContains: ID between: [ID] beginsWith: ID attributeExists: Boolean attributeType: ModelAttributeTypes size: ModelSizeInput }

input ModelIntInput { ne: Int eq: Int le: Int lt: Int ge: Int gt: Int between: [Int] attributeExists: Boolean attributeType: ModelAttributeTypes }

input ModelFloatInput { ne: Float eq: Float le: Float lt: Float ge: Float gt: Float between: [Float] attributeExists: Boolean attributeType: ModelAttributeTypes }

input ModelBooleanInput { ne: Boolean eq: Boolean attributeExists: Boolean attributeType: ModelAttributeTypes }

input ModelSizeInput { ne: Int eq: Int le: Int lt: Int ge: Int gt: Int between: [Int] }

enum ModelAttributeTypes { binary binarySet bool list map number numberSet string stringSet _null }

tcollins590 commented 2 years ago

I would like to se this be implemented to be able to filter server side when using graphql with a data store backend to be able to filter _deleted.

Any updates from the AWS team?

torgeadelin commented 2 years ago

Any update on this?

jgo80 commented 2 years ago

@acusti your workaround looked very promising - I could fire my mutation, however undelete did not take place. I think there is more too it, as also fields like DynamoDB TTL need to be nullified. Did this work for you somehow?

My Mutation:

mutation Undelete {
  updateTodo(input: {id: "a5c29725-eb8a-490d-af9d-0beedacd836a", _version:2, _deleted: false}) {
    id
    name
    _deleted
    _version
  }
}

Note: Replacing _deleted: false by _deleted: null doesn't work either.

Result:

{
  "data": {
    "updateTodo": {
      "id": "a5c29725-eb8a-490d-af9d-0beedacd836a",
      "name": "Test",
      "_deleted": true,
      "_version": 3
    }
  }
}
hackmajoris commented 2 years ago

Here we are, waiting for updates...

sacrampton commented 2 years ago

Hi Everyone - I see traffic on this issue coming through from time to time so I thought I'd write about the work around that we implemented that seems to be working really well for us.

In every model we have added two fields - isDeleted: Boolean and hardDeleteTime: AWSDateTime. We specify for each customer what the hard delete time should be in days. So when we delete a record in our app we set isDeleted: true and we calculate a hardDeleteTime: current time + hard delete time for that customer. We have updated our resolvers to exclude any records that include isDeleted:true. But at any time its easy for us to go back and set isDeleted:false and hardDeleteTime:null and effectively undelete the record. We use ElasticSearch a lot and in this scenario the isDeleted:true record is still available in ElasticSearch.

Then we have a Lambda Function that fires once we get to the hardDeleteTime and performs the appSync delete operation which sets _deleted:true and sets the TTL in DynamoDB. As soon as _deleted is set to true then AppSync deletes the record from ElasticSearch. So at that point we call it a hard delete - we are not getting it back at that point. However, when we set isDeleted = true that is very much a soft delete that we can reverse at any time.

We have been using this effectively in GraphQL (DynamoDB & ElasticSearch) & DataStore for about a year now.

danrivett commented 2 years ago

Thanks @sacrampton that's great detail. One question, how do you trigger your garbage collection lambda to fire when hardDeleteTime is reached?

Do you have a daily trigger and query DynamoDB by a GSI on that field, or is there another way to trigger the lambda to delete particular records?

Thanks.

sacrampton commented 2 years ago

Hi @danrivett - I've sent a note to my developer to confirm and will post back the results here.

However I think we just have a daily task that finds any hardDeleteTime less than current time and deletes them. We use a lot of ElasticSearch so that query is very easy in our custom ElasticSearch resolvers. For example, we have a range operator in our ElasticSearch resolvers were we can query for hardDeleteTime lte currentTime. Custom ElasticSearch resolvers make this stuff pretty easy.

{ "range" : { "${userRangeSingle.field}" : { "${userRangeSingle.operator}" : "${userRangeSingle.value}"}}}
sacrampton commented 2 years ago

Hi @danrivett - just confirmed with the team. When the isDeleted field is set to true we are setting the hardDeleteTime.

We are then running a cron on a daily basis to find the records that need to be deleted. We are querying DynamoDB rather than ElasticSearch and doing so with the sync query like this

query garbageCollector{
  syncAssets{
    filter:{isDeleted:{eq:true},
            _deleted:{ne:true},
            hardDeleteTime:{lt:currentTime}},  
    nextToken:nextToken, 
    limit: 1000}
  {items{id _version}}
}
danrivett commented 2 years ago

Thanks @sacrampton for taking the time to reach out and finding out these extra details, I really appreciate it, as it's great to understand how you've solved this in practice and not just in theory.

This is something we are considering implementing ourselves, so great to understand how you've already successfully implemented this.

LoveriusB commented 1 year ago

So it's been nearly 3 years now. Still no update? I just need to update the _deleted and _ttl field... doesn't seem to be that much when you compare it to the catastrophic changes that have been brought recently to datastore....

Bohemus307 commented 1 year ago

any word on this? I dont see the addition of numerous fields and infrastructure as an actual fix.

faheemfactor commented 1 year ago

Also it is labeled as feature though it is a bug. any updates???

@chrisbonifacio