boostercloud / booster

Booster Framework
https://www.boosterframework.com
Apache License 2.0
415 stars 86 forks source link

Enable event data deletion #551

Open adrian-lorenzo opened 3 years ago

adrian-lorenzo commented 3 years ago

Feature Request

Description

Currently, Booster supports the deletion of read model data, which enables the deletion of the latest projection of your entity/aggregate. However, even if your API does not enable the projection recovery using mechanisms like soft-deletion attributes which would disable the projection creation, the data can be obtained if someone has access to the DynamoDB instance and the adequate events of the event store are reduced.

Due to some of the requirements that applications may encounter (security policies or "the right to be forgotten"), it would be nice to have a feature that enables event deletion following a certain policy. Some of that policies that may be needed could be:

Possible Solution

Following my own criteria and the Booster principles, the solution should not destroy the advantages that event-sourcing and CQRS offer. 



The first potential solution could be the hard deletion of the events from the event store. This solution could be seem as simple and effective but it could be difficult to implement consistently for every use case on the event store which is not meant to be restructured due to is own nature, affecting its reliability.



The second potential solution could be following a crypto shredding strategy. The idea is to use a set of encryption keys to encrypt the data, and then delete the encryption key when needed to be forgotten, making events data useless and unreadable.



For this strategy it would be needed to have a Key Management Service (KMS) that manages the needed Customer Master Keys (CMK) and Data Keys (More information about the concepts here). The event processor would use the KMS to encrypt/decrypt data as needed and to request key deletion

.

Currently, AWS offers a KMS service which have all these features and can be used with the AWS encryption SDK and the AmazonDB Encryption Client. These could be the infrastructure components needed to implement the feature with the AWS provider using the crypto shredding strategy

.

However, we should take into account the following constraint: Each CMK created with the KMS service costs 1$ per month, which would mean that its usage will make Booster out of the free tier. This also means that using a CMK for each user/entity would be very expensive. The solution to this problem could be the implementation of a system that uses smaller set of CMKs, which is already proposed as a future feature in the AWS Encryption SDK specification.



Some of the strategies proposed in that issue to solve the problem are:



Further research and work is needed to find a suitable solution to this problem. Let's discuss it to see if we can create something effective and reliable.

Additional information

adrian-lorenzo commented 3 years ago

Proposal for event deletion: Crypto shredding using data keys generated by a single CMK managed by AWS KMS

New infrastructure resources needed

Amazon Key Management Service

Pricing

Solution

API proposal

Entities should have the possibility to be marked as Deleteable. This could be achieved by creating a decorator or a parameter to the already created decorators for this component:

@Entity
@Deleteable
export class Counter { ... }
@Entity(deleteable: true)
export class Counter { ... }

The API should then expose a new function through the Booster interface, called for example deleteEntity, with the same parameters as the Booster.fetchEntitySnapshot function:

function deleteEntity<TEntity extends EntityInterface>(entityName: Class<TEntity>, entityID: UUID): void

This function should be able to delete all the information related to the entity that is stored in the event-store: all the snapshots and events data with the same entityId. Also, it should delete the projected read-model of the entity if exists, which is a functionality that is already implemented.

The conditions for an entity to be the deleted using deleteEntity must be the following:

Functionality implementation

On deploy, as part of the AWS provider event-stack, Booster must instantiate a Customer Master Key to be a resource of the project, using Amazon Key Management Service, adding its ID to the outputs of CloudFormation to be used by the stack.

Also, Booster must Instantiate a DynamoDB table, called for example secret-store, which will contain entries with the following data:

The events-adapter, which is responsible of storing the events in DynamoDB using the storeEvents function, should encrypt the personal data that is stored in them if it is marked as Deleteable. The personal data is located in the following attributes from the event-store entries:

To encrypt the events, the following pseudo-algorithm could serve as a baseline:

// The Event type refers to an entry in the event store, 
// but could be an event or a snapshot
function encrypt(event: EventEvenlope, keyManagementService: KeyManagementService): EventEnvelope {    
    return encryptEventData(
        event,
        keyManagementService.decryptDataKey(
            getDataKey(event.entityID) ?? createDataKey(event.entityID)
        )
    )
}

Every time those events/snapshots are needed to be processed for another operation, they need to be decrypted, which involves those processes that use the readEntityEventsSince and readEntityLatestSnapshot functions of the events-adapter. To decrypt them, the following pseudo-algorithm could be used:

function decrypt(event: EventEnvelope, keyManagementService: KeyManagementService): EventEnvelope {
    const dataKey = getDataKey(event.entityID)

    if (dataKey === undefined) {
        throw DataKeyNotFound()
    }

    return decryptEventData(
        event,
        keyManagementService.decryptDataKey(
            dataKey
        )
    )
}

Finally, when needed to remove the data from the events, it is only needed to delete the data key related to it.

However, to have a real reference of the event state (deleted or not) without trying to decrypt the potentially deleted event, a new isDeleted flag could be modified, marking the event as deleted. This would invalidate the inmutability of the events, but could lead into a more efficient and pragmatic approach.

function deleteEntity(entityID: UUID): void {
    deleteDataKey(entityID)
}

// Optional: could enhance fetching operations
function deleteEvent(event: EventEnvelope): EventEnvelope {
    return {
        ...event,
        isDeleted: true
    }
}

The API actions needed to implement these features using AWS KMS are the following:

Optional improvements

To be sure that the sensitive data saved in the secret-store is secure, some enhancements could be integrated to improve security. One of the best and easiest to integrate is the usage of server-side encryption at rest, provided by DynamoDB, with an AWS owned CMK without charges.

Things that should be taken into account:

javiertoledo commented 3 years ago

Thanks for the thorough proposal! I'd definitely go for crypto shredding.

Just as you explained it, it makes a lot of sense to me, but I think we don't really need to use AWS' KMS. We can use a well-known encryption algorithm like AES and store the keys in that secret-store table, as you suggested.

I think that we should focus, at least for a first version, on entities (as aggregations of events). We have to make sure that both the entity snapshots and all the events belonging to a specific entity are encrypted with the entity's key. In that way, deleting the key will render unusable both the snapshots and the events without needing to loop through the events flagging them as deleted or anything else, they'll just become unreadable.

Regarding the fields, I think that it's fine for a first version to crypto-shred the whole event or snapshot object, but we should also consider that not all fields have the same level of confidentiality requirements, and for some applications, it could make sense to delete sensitive information, but keep metadata that could be used for future statistical analysis or other uses.

I the DSLs you proposed, especially the one that uses a separate decorator, and if we want to support keeping partial data, I'd rather use a whitelist approach, because it's often worse to forget to delete a field that should be protected than forgetting to keep a non-confidential field. We could do something like this:

@Entity
@Deleteable({ keepFields: ['field1', 'field2' ] })
export class Counter { ... }

This could become challenging though: Entity's fields are calculated from the events' fields, so declaring them in the entity could have no effect in the events, requiring us to provide a second decorator to keep some data from events:

@Entity
@OnDelete({ keepFields: ['field1'] })
export class Event1 { ... }

@Entity
@OnDelete({ keepFields: ['field2'] })
export class Event2 { ... }
adrian-lorenzo commented 3 years ago

First of all, thank you so much @javiertoledo! Your ideas make a lot of sense!

Yes, focusing on entities rather than in events make a lot of sense. Maybe there could be some use cases were it is needed to store personal data in events that are not aggregated, but I think it is a minority for now. The usage of an entity key, as you say, is the way to go.

It could make sense to allow a partial deletion of the aggregate information by the usage of a keepFields attribute to keep some of the attributes from the events/snapshots. However, I feel like the implementation of this feature could be complex, and therefore better to implement in next iterations, because the "aggregator" logic should now take into account which fields are encrypted and what to do with an entity that it is partial delete: an entity that does not meet the class specification.

Regarding the idea of adding an isDeleted flag to the event-store entries, I have been thinking that it could be a better idea to create a deleted-entities table with only one column (entityId) that includes all the entities that have been already deleted, so if the aggregator can check the table before trying to reduce an entity that is deleted.