Closed setaman closed 3 years ago
I would argue to go with the "simple soft delete option" as it seems to me the lesser effort with a suitable solution. Also lesser side efects witth asset management etc. expected.
Nevertheless, the archiving approach is also valid but can maybe added as additional feature in the future (when use cases require this feature or we run into database sizes taht could't be handled otherwise). Thinking of a button for admins like "archive datasets" and then we iterate over all data sources with a deletedAt-stamp and transfer it to a separate archive database.
Measuring in implementation effort, both solutions a quite simple thanks to our technical infrastructure. The concepts differ most in their semantics. I'm not completely happy with the option 1 (very common in software word), as it slightly violates our API's semantics. It would call a PATCH instead of DELETE, and I don't want to produce an "delete" event on a PATCH. Other way around, I don't want to make a DELETE request as actually nothing will be deleted. These are just little things that make software engineers go crazy.
If we want a simple to go solution with soft delete, i would offer the following:
{ deleted: "<current date>"}
. On PATCH, only a sub set of important meta data (e.g. title
, uniqueFingerprint
, entityType
etc.) will be left there.update
eventtitle
, resourceType
etc., maybe latest history entries and offers the possibility to restore the entitydeleted
presence, hard removes the entity from index -> it's not searchable anymoreid
linked entities still linked. We have currently relations only between one Asset and Resources/Assets/Users
We should not just remove the entity from an Asset. It can lead to confusion when entities disappear just like that. Instead, we leave the link to the dead entity there and let the human decide, to remove, or not to remove.
deleted
field on their own{ deleted: "<current date>"}
to restore it@mspiekermann @DaTebe feel free to post your thoughts, if there is something to complain. I can start next week with this issue.
The first thing we should do is adjust the management services to support possible deletion or archiving in a correct way. These changes should be easy to make.
Delete:
Archive:
If we done all of this, the more complex questions arise. Maybe we can already agree on the steps I described. If yes, we can go deeper into the rabbit hole.
@DaTebe all management service can already DELETE.
Hard delete is obviously the easiest solution and a good one, until an entity is accidentally deleted. Then the requirement for archiving will arise.
We can start with a normal hard delete and see what happens. But "Archive" or "Backup" is an important concept for a system like DIVA, we should keep it in mind,
That sounds good. Don't get me wrong. We should implement both solutions in our backend. How we propagate it to our client needs to be discussed.
As discussed with @DaTebe, we start with hard delete, delete all possible traces of the entity (Histories, Search, Assets). Than we will incrementally add archive features, as the need arise.
Branch 72-Delete-concept-for-Entities created for issue: Delete concept for Entities and assigned to null
@DaTebe hard delete is implemented in #80
@setaman what info is used in the dsc adapter to reference to a resource?
@DaTebe
the resource objects holds the offerId
, ruleId
etc. under dsc.offer
. That data is used to update offers on DSC.
But all the problems, also with MinIO, would disappear with archive feature in the next iteration on this issue. So first we can left this as is
No, we can not ignore it. How do we delete the unreferenced data in our second "archive" iteration?
We could send an "archive" event. On this event the resource will be removed everywhere except from the original collection. Services would be able to read required data either from the original collection or from the archive, depends on implementation details. After this , the hard delete would not be an issue.
For now, i can duplicate the DSC info to another collection with the corresponding resource id
, an than on delete read this by resource id
There was no way to store our uuid inside the dsc, correct?
correct
Another hacky solution would be to send the entire matadata as event. But we need to remember the max message size of 1MB...
Edit: we could also look into dedicated databases with key values to map from uuid to whatever the service needs to identify the resource.
Here another one: DSC allows us to put some additional properties. We could put the resource id to the offer. But DSC's API does not provide the possibility to filter the offers. So one would have to go through all the offers to find the needed one.
This all kind of hacky solutions we don't really want to implement.
I've tested our implementation according to our specification. Everything worked fine for me. Absolutely no hick ups.
Is your feature request related to a problem? Please describe.
DIVA has currently no official concept for deletion or deactivation of created entities (e.g. Resources, Assets, Users etc.). The only way is to hard delete the entities directly using corresponding Service API or through the Database
Describe the solution you'd like
So we need to provide a concept for deletion/deactivation/archiving. Actually, you don't want to have a hard delete at all. Because deleted data is deleted, can not be recovered. And certainly we should not give such power to our users without proper role management. Soft delete would be the better alternative. But there are also problems and opportunities here.
Simple soft delete option: Mark the entity as deleted and let it in the original Database/collection/index
entity
schema withdeletedAt
time stamp field that marks an entity as deleteddeletedAt
deleted
history entryDisadvantages:
Unique key violation. For example it would be not possible to import the same file again, if a soft deleted resource (with the same
uniqueFingerprint
) for this file exists.Soft deleted entities have to be filtered for all possible request
archive
)archive
database:id
- original id of the entityarchivedAt
- time stamp of the archivingactorId
-id
of the user/service that archived the entitypayload
- JSON-stringified meta data that we want to archivedeleted
history entryDescribe alternatives you've considered
The described flows are not mandatory, we can do some tweaks depending on concrete requirements and wishes. Additionally to soft archive i would suggest a disable or read only option to let the entity visible but deactivate any kind of editions. So probably we should have to options:
Also it would make sense to let the history entries or at least a few latest.