decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation
https://identity.foundation/confidential-storage/
Apache License 2.0
78 stars 23 forks source link

Will EDV and ADV protocols co-exist at the same layer? #131

Open agropper opened 3 years ago

agropper commented 3 years ago

ADV is for Authorized Data Vaults - where the storage provider has access to the data so they can re-encrypt it?

For example, in the ML use-case, the owner of a secure (Intel SGX) enclave in the cloud has the ability to re-encrypt data within the enclave. I would call that an example of an ADV. On the other hand, a cloud service implementing secure multi-party computation would be an example of an EDV.

Another example is storage as a directory. A directory that can be searched by authorized users by plaintext would be an ADV. A directory that holds only hashed metadata would be an EDV.

OR13 commented 3 years ago

At the same layer? probably not, for most definitions of "layer"...

granting authorization to plaintext and a serialization format / APIs for replica-table cipher texts are 2 different things entirely.

That being said, you can grant authorized access to systems that use encryption as the authorization mechanism, and you can grant authorization to systems that use encryption as well as "complex fine grained permission schemes".... One is more complex than they other, but they are not incompatible.

agropper commented 3 years ago

@OR13 I truly don't understand your comment. You might either explain your idea for "most definitions of layer" and then list which layer has ADV and which has EDV. Or you might explain how 'they are not incompatible' relates to any layer scheme.

OR13 commented 3 years ago

Layers:

This is how pretty much every system including dropbox / drive / filesystems work.

agropper commented 3 years ago

Layers:

  • Storage
  • Encryption
  • Authorization

OK - now we can probably agree that there is a Server (the storage) and various clients of some sort that might access the server at one layer or another.

Can you propose a naming for one or more clients and relate those names to where the encryption happens?

OR13 commented 3 years ago

@agropper sure

Case 1 Trusted Service Provider

The "Provider Client" runs on my computer, it is authorized to upload content to my "Provider Cloud Account", encryption happens only in the cloud and "Provider" does not expose an API for working with anything but plaintext.

The Provider is trusted, and when breached 100% of my data is stolen, because they use encryption at rest, but they controlled all they keys to access it.

The Provider only exposes an "authorized" API, no lower level apis. Once authorization is broken, data is stolen.

Case 2 Semi-Trusted Service Provider

The "Provider Client" runs on my computer, it is authorized to upload content to my "Provider Cloud Account", encryption happens ONLY on my computer and the "Provider" only exposes an API for working with ciphertext.

The Provider is semi -trusted, and when breached 100% of my encrypted data is stolen, but because the decryption keys are with my on my PC, the attacker can't do anything it it.

The Provider exposes an "authorized cloud storage" API, and a "client side encryption api".

Defeating authorization of the storage provider does not result in client plaintext compromise.

Summary

Case 1 is kinda the status quo... a service provider is compromised, and all the customer data is stolen... The customer trusted the service provider with encryption AND authorization... big mistake.

Case 2 is EDVs / Hubs... an improvement... the service provider is compromised, but the customer data is still safe because the service provider never had remote access to the customer data... the service provider's client had local access to the data.

Case 2 can still be defeated, if the provider's developer account is compromised and a bad build of the provider client is pushed which steals the data directly from the customer PC, not from their cloud account.

However, what if the provider didn't control the client? Now the attacker needs some other way to get local access to every user's PC... thats much more expensive, and likely to be detected.

In short, encryption and authorization are like food and water.

You need both to survive, and the attacker can kill you by poisoning either, but much faster if the attacker can poison both at the same time.

Splitting up where you get food and where you get water from helps because now the attacker needs to poison both sources independently to do the maximum damage to you... its extra work, attackers are lazy, they will hunt Case 1 type service providers and avoid hunting Case 2 service providers... making Case 2 service providers a better choice.

agropper commented 3 years ago

@OR13 I agree and thank you for naming the roles in consistent manner.

Now let's do:

Case 3 Typical Service Provider

The Provider is a lab (we can do streaming devices and searchable directories later if need-be) that gets a result from a machine they own.

The Provider Client runs on my computer and can authenticate with the Provider as well as control keys that can sign access authorizations that the Provider will honor. Authentication and authorization both require registration before the lab accepts a sample for testing on their machine.

The Provider seeks to minimize their breach risk and decides to outsource the long-term storage of the result with a cloud service. Upon completion of the test, the provider transfers the result to the confidential service and deletes it from the machine itself. The Provider has legal record retention requirements as well as providing secure access to their customer.

The Provider Client may or may not be prepared to handle the laboratory result itself (it's a genome) and the Provider may or may not be able to sign a result in a way that can survive third-party holders. So the Provider Client transfers an encryption key for the Provider to use when they move the data to their outsourced confidential store. The Provider deletes their plaintext copy of the result as soon as the transfer is complete. The Provider deals with their record retention requirements by sending a copy of the result to Ferrous Mountain for off-line storage. They may or may not encrypt the cold storage data.

The Provider Client needs to send the lab result to Endpoint X. The Provider Client has a key to decrypt the data and they expect the Provider to honor a signed request that includes Endpoint X. Endpoint X might be confidential storage also controlled by the Provider Client or it might be Bob.

What happens now?

OR13 commented 3 years ago

"Ferrous Mountain " :)

As far as I can tell, Case 3 is actually "Case 2" with a new entity that consumes "Case 2 Provider as a service".

Essentially:

Patient PC with Lab Client <-> Lab Provider <-> Case 2 Storage Provider (Ferrous Mountain)

In this case, "Lab Provider" just happens to be using a 3rd party storage service that implements a standard interface.

If either "Lab Provider" or "Case 2 Storage Provider" the data remains encrypted / protected.

One source of confusion with "client / server" is that some servers are clients.... for example... "Lab Provider" is a service which has a client for "Ferrous Mountain" storage.... it is that client that handles "backup" before the plaintext is deleted.

Patient PC Lab Client is connected to "Lab Provider" not to "Ferrous Mountain"... or at least thats how I am reading what’s you wrote... its possible that Patient PC might have a client which talks directly to both "Lab Provider" and "Ferrous Mountain"... depends on how we want that to be exposed / what legal agreements would exist between the user and the service provider.

agropper commented 3 years ago

@OR13 But what about Bob?

OR13 commented 3 years ago

lol :)

Assume the labs are associated with Alice, Bob can receive them from either "Lab Provider" or "Ferrous Mountain" depending on the eula / legal contracts that were signed.

Alice would be the only person who could authorize either provider to share with Bob.

agropper commented 3 years ago

Alice's raw genome might be 100 GB. The Provider Client is optimized for managing endpoints and keys, not re-encrypting medical records. Bob's client, whatever it is, knows what to do with a genome or other medical record component.

If you can propose a solution for Case 3 given our naming convention and layers, we will likely be done.

On Wed, Nov 18, 2020 at 11:05 AM Orie Steele notifications@github.com wrote:

lol :)

Assume the labs are associated with Alice, Bob can receive them from either "Lab Provider" or "Ferrous Mountain" depending on the eula / legal contracts that were signed.

Alice would be the only person who could authorize either provider to share with Bob.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/131#issuecomment-729817502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YK72XYHOKIMGS42SIDSQP5E7ANCNFSM4TT2672A .

OR13 commented 3 years ago

Alice delegates plaintext access to her genome to the "Lab Provider: too re-encrypt for Bob, and trusts them not to run off with the data :)

agropper commented 3 years ago

So, the lab store is encrypted with a symmetric key that Alice can rotate after the Provider re-encrypts for Bob and the Provider throws away the new key so that only Alice has the current key?

On Wed, Nov 18, 2020 at 11:19 AM Orie Steele notifications@github.com wrote:

Alice delegates plaintext access to the lab too re-encrypt for bob, and trusts them not to run off with the data :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/131#issuecomment-729826107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YOK6CHGCGREQTNZRTLSQP6Y7ANCNFSM4TT2672A .

OR13 commented 3 years ago

Anyone with plaintext and trust in another party can encrypt for that part.

If Alice can't afford to manage 100 TB of data, she will need to trust someone who can, like the Lab.

The Lab can encrypt that data for anyone alice wants... (the lab can also be compromised and the data will be stolen).

You can't have it both ways... once alice lets the lab see her data / encrypt for other parties you are back in case 1.

which is pretty much the status quo...

agropper commented 3 years ago

It seems we agree. So we need to acknowledge that labs, hospitals, dashboard cameras, and searchable directories, (Case 3) is part of the layer definition.

Our shared goal, I assume, is to encourage encrypted storage at rest for all Providers. The question becomes whether keys are symmetric or PKI and how each role protects the Provider Client(s) they control.

In the real world, some Provider Clients are controlled by the Provider (Case 3) and some are controlled by Alice (Case 2) or by Bob. GNAP seems to have figured this out https://datatracker.ietf.org/doc/draft-ietf-gnap-core-protocol/. They try to interact with the client of the Resource Owner at the same API as the client of the Requesting Party as much as they can. I think this means that the RO and the RQ are interfacing with the Provider at the same layer.

Based on this discussion, I'm hoping someone proposes a layering that includes GNAP. Call the layers whatever you want. Have as many other layers above or below the one that supports GNAP. A volunteer?

On Wed, Nov 18, 2020 at 1:04 PM Orie Steele notifications@github.com wrote:

Anyone with plaintext and trust in another party can encrypt for that part.

If Alice can't afford to manage 100 TB of data, she will need to trust someone who can, like the Lab.

The Lab can encrypt that data for anyone alice wants... (the lab can also be compromised and the data will be stolen).

You can't have it both ways... once alice lets the lab see her data you are back in case 1.

which is pretty much the status quo...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/131#issuecomment-729890525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YL27LLRGCRVHN6OTNDSQQLDLANCNFSM4TT2672A .

OR13 commented 3 years ago

@agropper GNAP is not stable enough to be considered seriously for anything other than the structured scopes part of this IMO.

I would suggest we focus GNAP support at that layer.

We should define "EDV / HUB Authorization Capabilities" as "GNAP Structured Scopes", with the understanding that they will work with HTTP Signature today (EDVs and GNAP both already support this).

Anything more than this is likely to be building on a "dream for the future"... which won't be possible for us to actually implement any time soon.

Since GNAP structured scopes are JSON and only require type this is most about defining examples for EDVs and Hubs of ZCap JSON.

we were making progress on this here: https://github.com/decentralized-identity/secure-data-store/issues/113

agropper commented 3 years ago

I disagree. I have a decade of experience with authorization protocols in the wild (OAuth2, UMA2 and many many related profiling efforts). Ignoring GNAP protocol work at this juncture is a waste of time.

The layer framing may be the problem here. When EDVs exist, people will use them, of course. But we are not dealing with the SSI adoption issues and will miss the zero-trust architecture opportunity if we ignore the GNAP protocol work.

On Wed, Nov 18, 2020 at 2:10 PM Orie Steele notifications@github.com wrote:

@agropper https://github.com/agropper GNAP is not stable enough to be considered seriously for anything other than the structured scopes IMO.

I would suggest we focus GNAP support at that layer.

We should define "EDV / HUB Authorization Capabilities" as "GNAP Structured Scopes", with the understanding that they will work with HTTP Signature today (EDVs and GNAP both already support this).

Anything more than this is likely to be building on a "dream for the future"... which won't be possible for us to actually implement any time soon.

Since GNAP structured scopes are JSON and only require type this is most about defining examples for EDVs and Hubs of ZCap JSON.

we were making progress on this here: #113 https://github.com/decentralized-identity/secure-data-store/issues/113

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/131#issuecomment-729925253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YPGSUYP25YLG76QHRDSQQS3XANCNFSM4TT2672A .

OR13 commented 3 years ago

@agropper I am not suggesting we ignore GNAP.

I am suggesting that we not take an unfinished spec as a core dependency... which we are prohibited from doing already by the charter.

I am suggesting that we look at support for GNAP in two ways:

I would like to see PRs that defined structured scopes and http signatures and that commented on GNAP.

agropper commented 3 years ago

Is ZCAP-LD an acceptable spec under the charter?

I'm mostly concerned about the layering approach. If the charter forces us to put authorization in a separate layer then we may have an interoperability problem. I also think the discussion around hubs, whatever they are, is going to be more difficult if we don't have a clear authorization protocol.

On Thu, Nov 19, 2020 at 11:57 AM Orie Steele notifications@github.com wrote:

@agropper https://github.com/agropper I am not suggesting we ignore GNAP.

I am suggesting that we not take an unfinished spec as a core dependency... which we are prohibited from doing already by the charter.

I am suggesting that we look at support for GNAP in two ways:

  • HTTP Signature
  • GNA Structures Scopes

I would like to see PRs that defined structured scopes and http signatures and that commented on GNAP.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/131#issuecomment-730540325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YNG7VCAJAS7XRV47V3SQVL7HANCNFSM4TT2672A .

OR13 commented 3 years ago

ZCAP-LD is a data model, not a protocol, and as discussed previously its compatible with both GNAP and HTTP Signatures.

We have to separate the structure of authorization from protocols for obtaining it, or we are tying ourselves to undefined behavior, and walking blindly into a holywar we can avoid or at least insulate ourselves from.

GNAP is ONE solution to obtaining authorizations... and it's compatible with my proposal.

If we are trying to say that GNAP is the ONLY way to obtain authorizations, I would expect there to be a lot of opposition to that.

That being said, I am not opposed to having that debate... I would suggest we move it to a very specific issue, something like:

PROPOSAL: GNAP MUST be supported by conformant implementers of EDVs and Hubs.

The proposal I am currently trying to gather support for is:

PROPOSAL: Authorization Scopes MUST be structured as ZCAPS encoded as GNAP structured scopes and conformant implementers MUST support HTTP Signature invocations of them.

^ note this does not preclude deeper integration with GNAP, it just doesn't require it.

agropper commented 3 years ago

I think https://github.com/decentralized-identity/confidential-storage/issues/36#issuecomment-731388832 is relevant to this issue.

agropper commented 3 years ago

There is a good argument to be made for a strict capabilities interface to an EDV where neither claims nor purpose play any role. After all, neither the requesting party claims nor their intent is any of the EDV's business. To achieve this, we need to deal with attenuated delegation one way or another.

GNAP is one way to achieve attenuated delegation at an EDV. @OR13's second proposal implies there is a way to achieve attenuated delegation to an EDV by using only GNAP structured scopes and HTTP Signatures. Can you expand on that?