decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation
https://identity.foundation/confidential-storage/
Apache License 2.0
80 stars 24 forks source link

Clarify: SDS sharing, is that read only - or read/write #49

Closed ewelton closed 3 years ago

ewelton commented 4 years ago

Just a question for clarification, in the context of the ecosystem - can other Agent/Wallets "write" to my SDS - or must they go "through" an Agent/Wallet?

Clearly "replication" of "my data" needs to write into my SDS - and I should be allowed to write into my SDS - but does it make sense for others to be able to write directly to the SDS and not via some Agent mediation.

Note - Agent mediated write means "Here, I am a Lab and I am giving you your covid test results" and you put them in your wallet. Versus, "direct" access, which is when the Lab reaches into your back pocket and puts the results in your wallet without going through the whole ceremony of "handing you the results" first.

agropper commented 4 years ago

If my prospective employer uses an SDS and needs a letter of recommendation from my school, my Agent should decide to allow the direct write from school to employer's SDS.

Separately, my wallet may have nothing to do with this transaction. My wallet helps me control my Agent by providing a secure element with local biometrics. My agent might have secure element functionality via something like Intel SGX but still can't do biometrics in a privacy-preserving way.

ewelton commented 4 years ago

What I think needs to be clarified here is in section 1.2. It is not very easy to talk sensibly about the architecture since most of the architecture is hidden.

There are three elements in play here:

It is not possible for the EDV-server to encrypt data - only the EDV-client can do that. So if the school writes to your EDV-server it must do so in an unencrypted format - so why not just use drop-box or something like that?

If the data is to be encrypted as part of an EDV, then it must go through the EDV-client and Agent/Wallet complex. The spec does not make any obvious clarification about what the terms Agent/Wallet mean beyond a reference to user-agent or storage-agent. But the sense of an Agent (that has fiduciary responsibility, can engage in communications, and can - when cloud resident - respond to requests such as "please receive this letter from the school)

Bypassing the user's ability to encrypt data in the EDV means that I do not think that this SDS stack is the appropriate vehicle - simple email or public file sharing is fine, since the data can not be encrypted by me - and everything in my EDV should be encrypted by me - if it is not encrypted by me, but is encrypted by the school, it should reside on the school's EDV and I should have permission to read it.

I would suggest that this is normative - The act of "actor A giving actor B a digital artifact" MUST never bypass the EDV-client and/or Agent and MUST never go directly to the EDV-server

I could be talked down from that strong position, but I think the community would benefit from a clear articulation of the conditions under which EDV-client/Agent/Wallet component should be bypassed.

dlongley commented 4 years ago

@ewelton,

If the data is to be encrypted as part of an EDV, then it must go through the EDV-client and Agent/Wallet complex. The spec does not make any obvious clarification about what the terms Agent/Wallet mean beyond a reference to user-agent or storage-agent. But the sense of an Agent (that has fiduciary responsibility, can engage in communications, and can - when cloud resident - respond to requests such as "please receive this letter from the school)

I think "Agent/Wallet" is merely an example in that ecosystem diagram. All that is really needed to write to an EDV is an EDV-client. Again, I agree that we need more architecture diagrams, including those that make clear which pieces of software play which roles vs. only having high level ecosystem diagrams.

Bypassing the user's ability to encrypt data in the EDV means that I do not think that this SDS stack is the appropriate vehicle - simple email or public file sharing is fine, since the data can not be encrypted by me - and everything in my EDV should be encrypted by me - if it is not encrypted by me, but is encrypted by the school, it should reside on the school's EDV and I should have permission to read it.

It was not clear to me when reading this whether people are aware that encrypting would use the public portion of the recipient's key, not the private one. So, if you give the school the authority to write to your EDV, then the school could use an EDV-client to encrypt to your key and store the data in it -- and you'd be able to decrypt it later.

I should also note that there is a tendency for people to talk about "their EDV" in the singular; perhaps we should be saying "an EDV" or "one of my EDVs" to make it clear that there will likely be many different collections of data that have different trust boundaries. You may want to give party X the ability to read/write any document in EDV A but you wouldn't want to do that with EDV B. So considering trust boundaries when determining what should be stored where is important.

ewelton commented 4 years ago

@ewelton,

If the data is to be encrypted as part of an EDV, then it must go through the EDV-client and Agent/Wallet complex. The spec does not make any obvious clarification about what the terms Agent/Wallet mean beyond a reference to user-agent or storage-agent. But the sense of an Agent (that has fiduciary responsibility, can engage in communications, and can - when cloud resident - respond to requests such as "please receive this letter from the school)

I think "Agent/Wallet" is merely an example in that ecosystem diagram. All that is really needed to write to an EDV is an EDV-client. Again, I agree that we need more architecture diagrams, including those that make clear which pieces of software play which roles vs. only having high level ecosystem diagrams. Yes - very specifically either the EDV-Client is in Storage, and a "device boundary" is introduced that grows to cut the middle of storage. Also - we could show "Agent->Storage" for the left hand Agent (which is talking to the other user's EDV client directly and bypassing their Agent - which is the part that is specialized for receiving messages.... and is why that just does not make sense to me) - please feel free to weigh in on #47 ;)

Bypassing the user's ability to encrypt data in the EDV means that I do not think that this SDS stack is the appropriate vehicle - simple email or public file sharing is fine, since the data can not be encrypted by me - and everything in my EDV should be encrypted by me - if it is not encrypted by me, but is encrypted by the school, it should reside on the school's EDV and I should have permission to read it.

It was not clear to me when reading this whether people are aware that encrypting would use the public portion of the recipient's key, not the private one. So, if you give the school the authority to write to your EDV, then the school could use an EDV-client to encrypt to your key and store the data in it -- and you'd be able to decrypt it later.

Yes - that is true. But I do not think I like that. Would the encryption be done the same way? Would I need to re-encrypt it using a different strategy if I were then to share it onwards?

I think I was not thinking in terms of "any old encryption" - e.g. encrypt it to me, and then store it so that it is encrypted - but I was thinking of the full stack of processing, indexing it, appropriate metadata, and preparing it for sharing and other mgmt. You could also encrypt it for me using the public key and stick it in drop-box or anywhere - the advantage, to me, of using an SDS is that I am using the other features of the SDS and that the data inside of it is fully controlled by me.

It is not clear to me that having different "classes" of data in my EDV buys me a lot - especially when I can access it on the provider's SDS. E.g. the point of "pushing it onto my SDS" is because I fully control it - and that would require going through the client.

Or - do I misunderstand - is there a way for the CEK/KEK/KAK dance to be done by other parties, such that I'm the only one who can decrypt the data?

I should also note that there is a tendency for people to talk about "their EDV" in the singular; perhaps we should be saying "an EDV" or "one of my EDVs" to make it clear that there will likely be many different collections of data that have different trust boundaries. You may want to give party X the ability to read/write any document in EDV A but you wouldn't want to do that with EDV B. So considering trust boundaries when determining what should be stored where is important.

I agree - I'm working on some candidate replacement diagrams that might highlight the idea of "a fit-for-purpose EDV" - but also they are "owned" by the controller right? I was a little confused about invoker and delegator in the description - or are we thinking about multi-owned environments with multiple keys in play?

Really am curious how the CEK/KAK/KEK dance is done in this case - i think the community as a whole might really benefit from walking through that.

dlongley commented 4 years ago

@ewelton,

Would the encryption be done the same way?

The encryption is always "done the same way", yes.

Would I need to re-encrypt it using a different strategy if I were then to share it onwards?

That depends on how the sharing is done: To enable others to decrypt you can either grant access to use one of the keys that can already decrypt (does not require re-encryption) or you can add new keys (requires re-encryption).

Or - do I misunderstand - is there a way for the CEK/KEK/KAK dance to be done by other parties, such that I'm the only one who can decrypt the data?

Yes, that is possible.

I agree - I'm working on some candidate replacement diagrams that might highlight the idea of "a fit-for-purpose EDV" - but also they are "owned" by the controller right? I was a little confused about invoker and delegator in the description - or are we thinking about multi-owned environments with multiple keys in play?

It's probably best not to think of EDVs as "owned", but rather, anyone with the authority to use them can do so. The controller is the root of authority for an EDV. The EDV configuration is used to generate the root capabilities. All authority must flow from there. So, for example, if you're using zcaps to implement authorization, this means that the first delegated capability in any capability chain must be delegated by a key controlled by the controller. If delegator is also specified, this also allows for delegation by a key controlled by a delegator.

Really am curious how the CEK/KAK/KEK dance is done in this case - i think the community as a whole might really benefit from walking through that.

This dance is independent of the EDV configuration entirely. The EDV-server enforces authority to read/write encrypted data. The encryption piece happens in the client. The only part the EDV-server could potentially play would be in enforcing rules around which recipients may appear in the encrypted documents (it could refuse to store encrypted documents if the recipients do not match what is authorized in a capability or through some other authz mechanism).

ewelton commented 4 years ago

@dlongley

I'm still not clear about something, so let's clear it up

so i'm excited by this idea - just figure there's something I'm missing in the crypto (and there is no clear pointer in the spec to quickly clear it up) - but then I read

Really am curious how the CEK/KAK/KEK dance is done in this case - i think the community as a whole might really benefit from walking through that.

This dance is independent of the EDV configuration entirely. The EDV-server enforces authority to read/write encrypted data. The encryption piece happens in the client. The only part the EDV-server could potentially play would be in enforcing rules around which recipients may appear in the encrypted documents.

Because if the encryption happens in the client, then what is the magic in the author's client, since none of their cryptographic material is required and since, by definition in this case we are not accessing my client.

Here is what I understand - for User 1 writing a document into User 2's server U1C - User 1 Client U1E - User 1 Server U2C - User 2 Client U2E - User 2 Server

Now - by definition, we are talking about U1C writing to U2E, such that only U2C can read the data. I just want to make sure that I could also use U1X - which is just something cobbled together using Postman and some encryption libraries, and that there is no cryptographic material from U1 involved at all.

This is one of the things that's confusing w/ 1.2 - sometimes it is important because the "client" can use "cryptographic secrets" but other times the "client" only means "not the server". It's like "sometimes the client must be Firefox" and other times it could be anything - i just want to make sure that we are explicit in calling out when "the client" means one vs. means the other.

Thanks for taking the time to walk through this - once we get 1.2 and 1.3 cleaned up, I think an appendix walking through the CEK/KAK/KEK dance is worthwhile. I can model it after the tango steps that someone tiled into the sidewalk outside of my airbnb when I was in Buenos Aires ;)

dlongley commented 4 years ago

@ewelton,

someone else can encrypt data such that I'm the only one that can decrypt it - I'm familiar with that in simple asymmetric encryption, but just to make it clear - where is that CEK generated if it is not generated by the source or by my client

The CEK is randomly generated by the party that performs the encryption. Of course, this means they can technically decrypt -- but of course they also have the cleartext already. What I was highlighting was they can encrypt the data such that the only other party that can decrypt it is the party that controls the recipient KAK (as well as any party that the KAK controller delegates access to).

Steps to encrypt:

  1. Generate a random CEK.
  2. Encrypt the data using the random CEK.
  3. Generate a random ephemeral key agreement key (KAK).
  4. Derive a secret using the ephemeral KAK's private material and the public material from the recipient's KAK.
  5. Derive a KEK from the secret and other local parameters.
  6. Wrap (encrypt) the CEK using the KEK.
  7. Attach the wrapped CEK, ephemeral KAK, and recipient KAK identifier to the encrypted data as a "recipient".

Then... you send the encrypted data to an EDV.

Now - by definition, we are talking about U1C writing to U2E, such that only U2C can read the data. I just want to make sure that I could also use U1X - which is just something cobbled together using Postman and some encryption libraries, and that there is no cryptographic material from U1 involved at all.

Yes, if I understand you correctly, you can also use "U1X". Hopefully that's clear from the steps outlined above.

ewelton commented 4 years ago

Excellent - I think that simple outline of steps would be great in the spec.

Is it correct to say this:

  1. the Sender picks a key and encrypts the asset using the key
  2. the Sender "safely" hands the key to the Recipient and puts directly in some EDV
  3. the Recipient and Sender now "share" the same CEK

I think, as you point out, that, in the case of theoretical exposure to information the Sender has the information already, so the fact that they both could decrypt the information in the Recipient's selected EDV is not an immediate privacy threat.

However, I think that it is more complicated, less powerful, and requires development and support for features (including UX and EDV-server) components for a negative payoff relative to alternatives. Consider the differences between the above and

  1. the Sender encrypts a document using the Recipient's public key over some existing message channel (like DIDComm)
  2. the Recipient's Agent receives the Document and stores it in one of their EDVs - performing all indexing, metadata, receipt acknowledgement, etc.

I think, once we get section 1.2 and the use case document skeleton fixed in the spec, we can begin to run through these cases in detail - over various deployment topologies and under various consent management and liability tracing.

in so doing we will see that we have to work quite a bit harder in order to allow this specific pathway, for an overall negative payoff. It is only sensible if you ignore the context and deal with the issue abstractly.

For example, if I am communicating with the Sender, I have established a new channel for them. What is the value of me generating an authorization token or configuring an authorization service to allow them to deposit it directly in my storage provider interface without notifying me when it happens has a UX and information management cost to me - one that is on par with "send me my letter and then dragging and dropping it on an EDV"

For unix junkies - ex is a file editor, so is sed - but many people feel that fancy "WYSIWYG" editors - so called "VIsual" editors or all sorts of fancy editors are preferable - even though the file edits they generate are identical. It's a monkeyspace issue.

For example - what if the Sender immediately posts the CEK on watch-my-cek.com? What if I want to generate an acknowledgement receipt? What if I want to "tag" the document with metadata and make sure it is indexed? What if I want to use an "auto indexer" - this is only possible if we "go through my client"

And that is where "it is done on the client" is confusing - my client and their client are different, and I might hire a service to run "my client" with "my storage" - maybe it's an "add-on" feature at the Agency that hosts one of my Agents or Cloud Wallets. In which case, I would prefer the 1-2 v.s the 1-2-3.

Summary

I also think that "write access" is not an all or nothing thing - especially if I pay-by-the-pound on my EDV-server. Giving someone write access to deposit something means that I have to authorize them to write, to write a specific thing, to write within some parameters (how much), to sculpt the access policy for time (how long is the write window open, is it a one-time thing, what if they send a revision 1 hour later, etc.)

That is a complex authorization policy - and one that I likely have already sculpted for my Agent, which is my digital twin and manages a lot of this already. I would suggest that the agent is a better venue for this sort of authorization policy sculpting, and the EDV system could be simpler, without sacrificing relevant power.

The next step in this is to figure out how to reflect the lessons above in the spec and identify follow on tasks.

agropper commented 4 years ago

Many of the SDS discussions make reference to an Agent / wallet separate from a SDS. In the Alice to Bob Use Case every actor has an agent (color in Fig. 1). Note that Fig. 1 does not show any data moves, it’s only concerned with control.

When the data held by the Store actor (Fig. 1) is encrypted, that content encryption key (CEK #48 ) may or may not be accessible to the agent associated with the Store actor.

Use-cases where the CEK is not accessible to the Store actor require that actor to evaluate a request presented by some client, represented by an agent (of Alice, of Bob, or Bob’s Employer). Will the same standard apply to requests presented by all clients seeking to read or write into the Store server? This question captures the relationship between Issue #36 and #49.

dlongley commented 4 years ago

@ewelton,

  1. the Sender picks a key and encrypts the asset using the key
  2. the Sender "safely" hands the key to the Recipient and puts directly in some EDV
  3. the Recipient and Sender now "share" the same CEK

The sender doesn't have to hand the key to the recipient directly at all, rather, they just put the encrypted document (which includes the wrapped/encrypted CEK) into an EDV. All the recipient needs to decrypt is access to the encrypted document (and the ability to use their KAK). I'm trying to highlight that there is no additional out-of-band sharing of keys/anything else necessary for the use case we're discussing.

  1. The sender picks a key and encrypts the data, representing it as an encrypted document.
  2. The sender puts the encrypted document in the recipient's EDV.
  3. The recipient can now retrieve the encrypted document and decrypt.

However, I think that it is more complicated, less powerful, and requires development and support for features (including UX and EDV-server) components for a negative payoff relative to alternatives. Consider the differences between the above and

  1. the Sender encrypts a document using the Recipient's public key over some existing message channel (like DIDComm)
  2. the Recipient's Agent receives the Document and stores it in one of their EDVs - performing all indexing, metadata, receipt acknowledgement, etc.

These seem to just be different use cases to me. Note that both require "development and support for features". The former requires one software stack: EDVs. The latter requires two: DIDComm and EDVs. So in the latter case, one trade off is that the sender doesn't need EDV software but both parties need DIDComm.

There are other different use cases where you would want the sender to use their own EDV to store data and to grant zcaps to access it; this could be particularly helpful when you want to protect yourself against spam/deal with storage constraints.

I think what choices will be made depend on the circumstances/use cases here -- there are reasonable trade offs to consider.

agropper commented 4 years ago

I'm trying to highlight that there is no additional out-of-band sharing of keys/anything else necessary for the use case we're discussing.

The sender picks a key and encrypts the data, representing it as an encrypted document.
The sender puts the encrypted document in the recipient's EDV.
The recipient can now retrieve the encrypted document and decrypt.

Does not mention the data subject or her agent. Is control or even transparency to the data subject entirely out-of-band?

dlongley commented 4 years ago

@agropper,

Does not mention the data subject or her agent. Is control or even transparency to the data subject entirely out-of-band?

Yes, those things, if they exist (they would for some use cases and not others), are at a different layer than the low-level EDV storage layer.

ewelton commented 4 years ago

Does not mention the data subject or her agent. Is control or even transparency to the data subject entirely out-of-band?

Yes, those things, if they exist (they would for some use cases and not others), are at a different layer than the low-level EDV storage layer.

I think one of the things I'm struggling with is that I am aware that they are at different levels, but it is exceptionally difficult not to get derailed in any discussion of how to relate the levels. This is due, in my opinion, to the utter lack of acceptable sections 1.2 and 1.3 (Issue #46, for starters, which we are still struggling to get recognized - way, way, way before issue #35)

I did not envision any "out of band" communication - i was imagining that the shared writing was only in the EDV. I guess I naively thought that the encrypted key was called out, as in writing (document,encrypted key) to the EDV and not just (document). I don't think they are different use cases as much as they highlight that there is one pathway that makes better contextual sense almost across the board - enough so that focusing on the Agent-bypass pathway is more distracting than anything. I think Adrian made a good point here:

Will the same standard apply to requests presented by all clients seeking to read or write into the Store server? This question captures the relationship between Issue #36 and #49.

and that relates to

I think what choices will be made depend on the circumstances/use cases here -- there are reasonable trade offs to consider.

I absolutely agree - the question is how to break out of "tech fixation" and include the context - for example: (summarizing my github issues)

we'll see. I really do appreciate the conversation and your support of this discussion and these questions.

OR13 commented 4 years ago

I don't think sharing implies there is a requesting party...

I can grant you the ability to access something, without notifying you or having you make a request.

msporny commented 4 years ago

We might want to break this down in "Authorization to Read" and "Authorization to Write".

agropper commented 4 years ago

@OR13 I think we're talking past each other. From the store's point of view, there's always a requesting party otherwise it might as well be off-line.

I'm struggling to see what "grant you" (where you is the requesting party in my world) has to do with it. You may be implying that the store is getting an update to its access control system. That gets into a layering question as to whether a requesting party is accessing the store's access control component or the storage component. Is it "turtles all the way down"?

OR13 commented 4 years ago

I think of EDVs as password managers, and not really authorization servers that sign claims...

It seems like the OAuth concepts of RP, are more accurate for describing things like "requesting a VP", which may come from an EDV, but which I would think more likely would come from an agent of some kind....

In my mind, an edv might be used by and OP to store claims about users... its at a lower layer than RP / OIDC.

OR13 commented 4 years ago

Related to authorization discussion. need to return after covering layers.

OR13 commented 4 years ago

You can do both? pending close, as its been discussed.