decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation
https://identity.foundation/confidential-storage/
Apache License 2.0
79 stars 23 forks source link

Criteria for deciding the layering and transformations/operations in each layer #51

Closed venu2062 closed 3 years ago

venu2062 commented 4 years ago

arch layer capabilities.pdf

OR13 commented 4 years ago

@venu2062 would you mind uploading an image and text and not a docx.... I don't want to encourage people to submit executables in issues....no offense Microsoft :)

venu2062 commented 4 years ago

@OR13 how do I do that. Copy paste is not working.

venu2062 commented 4 years ago

What should a layer be capable/aware of? Criteria • Data Abstraction: Application (Object/File) vs Storage (Shard/BLOB) • Scope: Multiple Providers vs Single Provider • Client: Specialization, Control (Privacy, Integrity), Awareness (Broader) • Server: Commonality, Consistency (Prevent duplication) Operations/Transformations/Filters: • Tamper proofing (digital signature) – client of an interface • Encryption – client/server of an interface • Replication – client/server of an interface (across providers vs storage redundancy) • Versioning – server of an interface (conflict resolution vs history of changes) • Authorization – server of an interface (Application vs Storage) • Search – server of an interface (Application vs Storage)

creatornader commented 4 years ago

Screenshot from @venu2062's document:

2020-05-21 (2)

venu2062 commented 4 years ago

Example:

Layer 1: Storage Layer (BLOB storage organized into Vaults)

Basic capabilities

Is there a need?

OR13 commented 4 years ago

@venu2062 are you asking if there is a need to apply authorization at the blob level?

I think generally, if data can be pulled from the server, regardless of the level of abstraction, there is a need to apply authorization, although I question if we need to expose interfaces lower than:

"give me an encrypted stream/blob/document based on this id and meta data"

I'm not sure about groupings.

I think that there is no need to consider multiple storage providers yet...

We should assume some simple storage interface, like key / value storage on filesystem / local storage first.

certainly a server could decide to use multiple storage providers, but I think we need to address the simple case of 1 storage provider, 1 server, 1 client first.

venu2062 commented 4 years ago

@OR13 I am not asking any questions. I am suggesting criteria and framework for arriving at, validating and defending architecture decisions. What is outlined here may be implicitly understood/assumed but having it explicitly defined helps in communicating the decisions.

One of the issues I have been facing is that I have no way of evaluating the layering being proposed during the calls.

venu2062 commented 4 years ago

@OR13 The first part of the document outlines a set of criteria and a list of operations/transformations to be considered – more transformation such as compression may be added without loss of its usefulness.

The picture tries to put it all together:

1) There are two object spaces: one determined by the applications and the other by storage characteristics • Application object space has a broader scope and awareness • Storage space is specific to one storage provider as data is already converted/formatted to fit the requirements of that provider, for example, shard size

2) Each object space is divided into server and client spaces - here client is part of the next higher layer as illustrated in the picture • Client side is generally used if a feature requires specialization by each client or more control at a higher layer (privacy) • Server side should be used if consistency or common way of performing something is needed

3) Most data operations/transformations can be applied in both spaces but the decision should be made using the criteria and the objectives of the operation/transformation.

Example: Replication If replication is to support multiple providers, it should be part of the application object space – further if replication is specific to each client, it should be performed on the client side and vice versa On the contrary, if replication is to provide storage redundancy it should be done in the storage space – an assumption may be made that this should be part of the data base/store to be used and hence may not be needed separately – I prefer to state such assumptions explicitly.

Similar reasoning can be applied to encryption

4) Some operations may need to be modified to fit the space in which they are implemented

Example: Authorization It is meaningful to have authorization in the application object space for each object by each application and/or user. The same does not work well in the Storage Space: storage objects (BLOBs) are not directly administered, if they inherit authorization from the parent application object the authorization metadata can enable correlation. It may cause integrity issues if the authorization metadata consistency is lost across all shards/BLOBs So, it is more meaningful to have authorization defined by object containers/vaults.

Hopefully, that clarifies the intent.

venu2062 commented 4 years ago

@OR13 The last part of the document is an example to outline the lowest layer characteristics (both capabilities and awareness): • Capable of storing and retrieving shards/BLOBs independent of each other • Should prevent correlation of BLOBs/shards - No common authorization or other grouping metadata – limit awareness • Authorization by container of objects/vaults • No need to be aware of other storage providers at this layer – limit awareness

Attempts to answer my question on awareness on one of the calls

agropper commented 4 years ago

Venu,

How do your thoughts compare with https://datashards.net/ ?

On Mon, May 25, 2020 at 1:02 PM venu2062 notifications@github.com wrote:

@OR13 https://github.com/OR13 The last part of the document is an example to outline the lowest layer characteristics (both capabilities and awareness): • Capable of storing and retrieving shards/BLOBs independent of each other • Should prevent correlation of BLOBs/shards - No common authorization or other grouping metadata – limit awareness • Authorization by container of objects/vaults • No need to be aware of other storage providers at this layer – limit awareness Attempts to answer my question on awareness on one of the calls

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/51#issuecomment-633653807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YPDKGBLEAQ3ONJEMATRTKQDJANCNFSM4NHERK2A .

venu2062 commented 4 years ago

Hi Adrian, At a glance it looks like encrypted IPFS. My problem with IPFS has always been that it is difficult to guarantee persistence (D part of ACID) unless storage providers in turn guarantee it. If it is used with per-designated storage providers, it may be equivalent to what is being proposed here with some additional interface abstractions. Venu

OR13 commented 4 years ago

@venu2062 I left a comment here which is related: https://github.com/decentralized-identity/secure-data-store/issues/41#issuecomment-635511903

I think it is consistent with your suggested approach.

To me it seems like we are both discussing mostly sub layers of layer 1...

I feel like we are almost ready for a PR that tries to make these more explicit in the spec.

venu2062 commented 4 years ago

@or13 I will read it offline and comment.

venu2062 commented 4 years ago

@OR13 I am not talking about just layer 1 but the spectrum of all transformations and operations that are needed between an application using SDS and the underlying storage. I used layer 1 as an example to illustrate the idea.

I also used bank safe deposit boxes as a metaphor to explain SSI but it only goes so far. In SDS case, the storage and vault authorization are relevant but the access to keys, recovery of keys, etc. are outside our scope (key management, recovery, back doors for 3rd party access, etc. are to be handled by DKMS, I am assuming). As an example, equivalent of drilling a safe deposit box for 3rd party access or to recover from loss of keys is not relevant to SDS.

I don't want to prescribe specific layering but suggest a way to reason and defend the architectural decisions - I can prescribe a layering if that helps.

1) We can draw some parallels with OCI network layers - the bottom 3 OCI layers deal with a single network link while the top four layers deal with end-to-end communication. I think that we have such a demarcation here, before and after sharding because sharding is specific to a storage provider. 2) As we build layers, we should be considering the possibility of multiple specializations of higher lever layers. Such a possibility can potentially be used to circumvent security. For example, authorization provided at some intermediate layer can be worked around by building a parallel layer that bypasses it unless additional security measures such as encryption are used. This necessitates that vault level authorization be at the lowest layer - so, your Layer A (Layer 1 in my picture) should include vault level authorization. This layer also is specific to one storage provider and should include all relevant features and functions. Layer B may be in one or more layers and should not only deal with logical data model but also all functions that need to be aware of more than one storage provider. Layer C cannot be a separate layer of SDS as it can be worked around. My suggestion is to use encryption (similar to the envelope idea of DIDComm) to deal with authorization at the logical data model. Even if Layer C is implemented to provide authorization, it cannot be relied on in the personal data context and needs to be reinforced in the client/application. So, if we want to provide it, we should think of specific use cases where it can be useful and make it optional for cases where it is not needed.

OR13 commented 4 years ago

Some takeways from your response:

  1. What does "also all functions that need to be aware of more than one storage provider." mean? Does having more than one storage provider even make sense for a single server?... or is it better to have multiple servers each bound to a single storage provider?

  2. Should we just rely on encryption and let anyone request any cipher text (I would suggest the answer is no)...

encryption and authorization need to be separate layers, if we are taking a security in depth approach...

We could decide to just use Github/IPFS and encrypted content to handle authorization, and say: if you can decrypt, you are authorized...

Thats fundamentally weaker than restricting access to cipher text by leveraging digital signatures AND relying on encryption to protect content.

If we don't implement an authorization layer, we are suggesting that encryption is sufficient, when we know that it has a shelf life, and that if you don't need the plain text, you should not have access to the cipher text....

Now if we want to offer a proxy server, that leaks ciphertext without authorization for a specific use case, or that decrypts cipher text and provides plaintext without authorization for a specific use case case, I am in favor of handling that.... in another service.... but such a service is IMO, not the service we are discussing in this working group.

venu2062 commented 4 years ago

What does "also all functions that need to be aware of more than one storage provider." mean? Does having more than one storage provider even make sense for a single server?... or is it better to have multiple servers each bound to a single storage provider?

This should include replication, compression and anything else common across providers. I would suggest that Layer1 be specific to each provider but layer 2 or higher be able to handle multiple storage providers. This also goes to specialization vs common/consistent functionality. If replication is to be a specialized function it should be left to the application. If support of multiple storage providers is to be part of SDS, it should be part of the one of the layers in the logical data space.

venu2062 commented 4 years ago

Should we just rely on encryption and let anyone request any cipher text (I would suggest the answer is no)...

Redundancy in security is not going to hurt, I suppose.

But if such an authorization is to be supported, bypassing it by directly accessing the lower level layer should be prevented. Otherwise, it creates a false sense of security - can be damaging in case a higher level layer does not reinforce it.

This can be done by having the layer supporting authorization authenticated at the lower level or the data encrypted by the authorizing layer so that a parallel service cannot bypass authorization.

OR13 commented 4 years ago

agreed. a related question is how is authorization reflected in a data model, and is that data model also stored in storage?

I think we are assuming 2 scenarios here...

  1. Honest server operator (enforced authorization, does not tamper with data model)
  2. Dishonest server operator (does not enforce authorization, attempts to tamper with data model)

Case 1 is accounted for with proper software development.... but critically, you must TRUST the software vendor.... like you trust dropbox / S3 / Google Drive today... if you don't trust the server operator... then you can't rely on them for authorization enforcement, and since they already have no control over encryption... you should pretty much switch vendors, since authorization enforcement is literally their only job other than dumb storage of encrypted data... which is also a solved problem.

Case 2 is out of our control, I can implement a compatible IPFS HTTP API and hand back data that does not actually match content requested... I can even use IPFS under the API layer, and just lie at the interface boundary... Tampering is detectable on the client (HMAC), however there is no way for the client to know if the server operator is handing out their ciphertext to the attacker / lawful intercept requestor...

venu2062 commented 4 years ago

I am not making any assumptions about the honesty of a service provider or lack there of. That is the application/client responsibility whether to trust a provider or not. But, if they trust a provider and the provider is honest, we should ensure that the trust contract cannot be bypassed by a dishonest actor.

The point is that a higher level layer can be built to access the lower level interface and thus may bypass authorization.

Let's say SDS has two layers: storage layer and logical layer. Storage layer has vault level authorization and logical layer provides authorization for the application level data.

Let's assume that storage layer is deployed at one or more cloud providers: AWS, Azure, etc. This provides storage of data and vault authorization by user.

Let's assume that MS hosts logical layer where users can control access to the application objects. This in turn can store the data on AWS or Azure or both as required by a vault owner(user).

Let's say that I have a vault created via the MS logical layer in AWS physical layer. I wanted to give you access to one of the credentials I stored in the vault. So, I gave you access to the credential in logical layer which also gives you access to my vault in the storage layer.

You can potentially write a logical layer that can access the storage layer at AWS via its interface and read my entire vault. This assumes that client does not reinforce security using encryption because it/he/she depended on the server to enforce authorization.

OR13 commented 4 years ago

What action needs to be taken to move this forward?

OR13 commented 4 years ago

Maybe a Pull Request to add language about layers.

OR13 commented 3 years ago

pending close, no PR, maybe no interest in clarifying in spec.

venu2062 commented 3 years ago

When I originally proposed it, the layering was to be a starting point for arriving at an architecture considering sharding, encryption, authorization with positive control in the hands of the user.

Now, that the architecture is finalized, I am not sure what purpose putting this into the spec serves.

I am also not working in SSI or Blockchain any longer. With Covid related budget changes, I had to move to an area with more market traction.

Please close the issue.