Clarifying relation of storage to agent/wallet relative to 1.4.1 and 1.4.2

decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation

https://identity.foundation/confidential-storage/

Apache License 2.0

79 stars 23 forks source link

Clarifying relation of storage to agent/wallet relative to 1.4.1 and 1.4.2 #48

Closed ewelton closed 3 years ago

ewelton commented 4 years ago

I am wondering if we can clarify the role of Agent/Wallet (see ecosystem diagram) relative to 1.4.1 Privacy and multi-party encryption

If I interpret 1.4.1 as requiring encryption at rest, and I imagine that I am encrypting that with key material which is not resident within the SDS then I can imagine that the Agent/Wallet (or client? or consumer? or gateway?) is doing the encryption - and the storage provider is simply holding my digital bolus.

If I do that, and then decide to authorize (as in 1.4.2) some parties to receive that information directly from Storage and not requiring the activity of an Agent/Wallet, then I am confused as to how to do that without, at some point, running the data through processing that involves my SDS external key material - which likely means the Agent/Wallet.

In other words - if I want to empower the SDS to allow authorized sharing (using the technologies of 1.4.2) to N parties, which I decide after I originally store the material for sharing with a single individual (e.g. myself), then I think I need to somehow "re-encrypt" that data - which involves, I think, at least one decryption of the data - or one RT through an Agent/Wallet possessing my private key material.

Options would be to

re-encrypt the data so that a single copy exists, but it is encrypted such that N+1 (all people + me) can decrypt it. This requires a complete re-encryption for every change to the conditions of 1.4.2, and kinda means that 4.2.4 is of 2ndary importance (like putting a locked box in a locked box)
make a copy of the data encrypted for each recipient, and delete it upon revocation. This requires a complete re-encryption for every change to the conditions of 1.4.2 and kinda means that 4.2.4 is of 2ndary importance (as above)
use some crypto technology that allows me to do transcriptions of the form of 1 in place, without access to private key material. If this is the standard - could someone point me to more of this technology and could we call it out in the spec?

Alternatively - if an agent is always in play, or even sometimes in play, then that significantly changes the relationship between the SDS (Storage) and many of the authorization, encryption, and sharing requirements of other ecosystem elements, like Agents and Agent/Wallets.

dlongley commented 4 years ago

If I interpret 1.4.1 as requiring encryption at rest, and I imagine that I am encrypting that with key material which is not resident within the SDS then I can imagine that the Agent/Wallet (or client? or consumer? or gateway?) is doing the encryption - and the storage provider is simply holding my digital bolus.

Yes, a client always performs encryption/decryption; the server never sees the cleartext data and does not have the private key material required to decrypt.

In other words - if I want to empower the SDS to allow authorized sharing (using the technologies of 1.4.2) to N parties, which I decide after I originally store the material for sharing with a single individual (e.g. myself), then I think I need to somehow "re-encrypt" that data - which involves, I think, at least one decryption of the data - or one RT through an Agent/Wallet possessing my private key material.

First, re-encryption is not strictly necessary (I'll get to that in a moment) and even if "re-encryption" is used, there are several different layers of "encryption" where that could occur (I'll also get to that in a moment). Also note that possession of private key material is also not strictly necessary (I'll get to this first).

"possession of private key material is not strictly necessary to decrypt" - If one is using a system such as WebKMS, then users may be given capabilities (e.g., zcaps) that enable them to execute key operations without them actually having direct access to private key material. This can be tremendously helpful with key management -- and as you'll see below, helps with sharing access to encrypted data.
"there are several different layers of encryption" - Current implementations of the EDV spec perform encryption by first using a "key agreement key" (or KAK) that is combined with an ephemeral key to derive a secret. This secret is one input into a key derivation function (other inputs include key identifiers, etc.). The key derivation function outputs a key encryption key (a KEK) which is used to encrypt/decrypt (aka "wrap"/"unwrap") a content encryption key (CEK). The CEK is what is actually used to encrypt/decrypt the content that is stored in the EDV; the CEK itself is then encrypted/wrapped for each potential recipient, where "recipient" refers to a particular KAK. So, "re-encryption" could be performed at merely the CEK layer -- but, even this is not strictly necessary if a capabilities-based system is being used, as I will get to next.
"re-encryption is not strictly necessary" - If one is using WebKMS, a user that has encrypted some document and stored it in an EDV with a recipient that refers to KAK "X", may delegate a zcap to use X to some other user (person/agent/whatever). If the EDV's authorization mechanism is also zcaps, they may delegate a zcap to access a particular document in the EDV. Using both of these zcaps, this other user may access the encrypted document -- and then derive a KEK to decrypt the CEK and decrypt the content and read it. The original user may do this (delegate zcaps, no "re-encryption") for every additional user they wish to enable to read the content -- all without revealing the private key material used to encrypt. This also enables them to revoke the zcap that authorizes another to use X and to access the document as they see fit. Notably, once someone has read the content they can't "unsee it" -- but each time the content is changed, a new CEK and ephemeral key will be created such that the new contents will be unreadable to those that can no longer use a zcap to use X to derive a secret.

All of this is possible due to a layered approach.

agropper commented 4 years ago

I take issue with the statement "the server never sees the clear text data" in 1.4.1. This is a non-normative requirement that may conflict with many important use-cases including all of the cases when the server is the source of the data. For example, a laboratory doing blood tests or a school issuing a diploma need to store the result somewhere and using SDS for that is, I believe, in-scope.

ewelton commented 4 years ago

@dlongley - this is really great stuff - thanks for the response.

It seems like much of the viability of the current architecture comes from ZCAP, however, it is listed only once - in 1.4.2 - in this context

OAuth2, Web Access Control, and [ZCAP]s (Authorization Capabilities).

Now, I'm a fan of ZCAP, I like them - but if the field is to be open here, then perhaps we will benefit from enumerating the specific requirements of the encryption, at the SDS level, so that the owner Agent/Wallet is not involved - what are the properties of the encryption such that sharing is possible. For example: If we used OAuth2 instead of ZCAP for whatever reason, would that substantially impact the available solutions to 1.4.1?

I like the strategy described in point 2 - but I would counsel against using the term 'layers', as that will conflict with the architectural layers discussion - unless the L1, L2, and L3 in issue #44 is somehow the same layer here? I don't think so - but finding some lexical hint to separate the terms might be helpful.

Setting my niggling aside - let's look at this in detail:

"there are several different layers of encryption" - Current implementations of the EDV spec perform encryption by first using a "key agreement key" (or KAK) that is combined with an ephemeral key to derive a secret. This secret is one input into a key derivation function (other inputs include key identifiers, etc.). The key derivation function outputs a key encryption key (a KEK) which is used to encrypt/decrypt (aka "wrap"/"unwrap") a content encryption key (CEK). The CEK is what is actually used to encrypt/decrypt the content that is stored in the EDV; the CEK itself is then encrypted/wrapped for each potential recipient, where "recipient" refers to a particular KAK. So, "re-encryption" could be performed at merely the CEK layer -- but, even this is not strictly necessary if a capabilities-based system is being used, as I will get to next.

Please do correct me if I'm wrong - but I want to see if I can reflect the idea of the current implementation - and, if I can do that we might be able to move it to the spec. As I understand it, the current implementation does this:

the content is encrypted (client/consumer/agent/wallet side) and stored once using the CEK
the sharing of the data is based on sharing the CEK with other systems - note - this is not "simple" sharing, like "email me the CEK", but ultimately, since every consumer of the data decrypts the data they ultimately have the key - which could be shared on "revenge cryptography" sites like "watch-my-cek.com"
key sharing is handled by something like my original strategy '2' in the OP - e.g. I make one CEK encrypted copy of the data, but I have a different bolus for each authorized user - but, because i expect revocation to be messy, as a practice, when I revoke access I will typically "re-encrypt" the data with a new CEK and then share the new CEK via the outlined strategy

In general I really like this solution - but it is not clear to me how WebKMS solves the problems. KMS is out of scope for the specification, according to the ecosystem diagram and the current spec. Also, I do not see how WebKMS solves the "access to the private key material" - if it is a "call out" from the Storage block (see the ecosystem diagram) to some other block - either the Agent/Wallet or the KMS block then that needs to be clarified. I think that counts as "access to the key material" - and I think this is also an excellent example of why we need to clean up the context and positioning information (e.g. issue #47 )

Lastly, I think it might be useful for the community and the spec to really get this property "and then derive a KEK to decrypt the CEK and decrypt the content and read it" clear - as it is very much at the heart of the proposed solution and implementation. It suggests a lot of interesting questions and features that can be used - for example, how many KEKs can decrypt the CEK and what does this multiplicity provide - or is the fact of the derivation simply to avoid direct exchange of the KEK via plaintext? If the spec is effectively sensitive to this sort of mechanics, then it behooves us to spell out some of these mechanics in an appendix to the spec, as well as what role the property plays in constraining the specification.

All in all, I really like the big picture forming here - with ZCAP and WebKMS - and I think it makes sense to work from the existing EDV implementation towards the spec. The extent to which we can constrain the spec early, and limit it mercilessly - the better - and I think we need to pay a little more attention, early on, to nailing down the Ecosystem and Context component of the spec - prioritizing that ahead of critical details like chunk size negotiation protocols.

dlongley commented 4 years ago

@agropper,

I take issue with the statement "the server never sees the clear text data" in 1.4.1. This is a non-normative requirement that may conflict with many important use-cases including all of the cases when the server is the source of the data. For example, a laboratory doing blood tests or a school issuing a diploma need to store the result somewhere and using SDS for that is, I believe, in-scope.

In that case, the "server" you refer to is playing the role of the client. We need to do a better job of explaining that "client" and "server" are roles in an interaction in the spec so this is clear. We may also want to use different terms if this continues to be a source of confusion. The "server" you mention is not the same as the "server" role from the spec.

ewelton commented 4 years ago

@dlongley yes, as issue #38 indicates as well. And as I mentioned in #47 - we need to do a lot of work to convey the context. Servers are sometimes clients, clients are sometimes servers. Sometimes the data sharing is between SDS and sometimes it is between SDS and Agent, and sometimes a KMS gets involved - possibly with, and possibly w/o an interloping agent. The solution is taking shape - but I think we need to continue to refine the part of the spec that "places this all in context" and defines roles.

@agropper I think that opacity to the data managed by the SDS is essential and should be normative.

I also think your point is a good one - but it strikes me that it would be very valuable to trace through that sort of use case in the context of the ecosystem diagram.

From what I can tell, there is a difference between sharing data "through an agent" (or access server) and directly to the SDS. It is an open question for the group though, and I'm going to open it as a new issue (#49 )

If a Lab hands my Cloud Agent or Access Server a result, and the actor operating with a fiduciary relationship to me (my agent, my access server) places it in my SDS, then the Cloud Agent or Access Server can oversee the encryption (and the key material can be on some local device or in the cloud) while the SDS itself can see nothing. I can then "share" that data with a consumer on a read-only basis directly from the SDS. On the other hand if I want a Lab to write into my SDS directly, then it seems a little more complicated to orchestrate the permissions - so much so that one might question the value of writing directly to the SDS and bypassing the Agent/Access Server.

Perhaps, when we deal with #46, we can include a path through the new diagram for #47, and this may help suggest terms for #38 - as the "client/server" or "producer/consumer" role of any given component will change based on the use-case - for me, at least, a set of diagrams through which we trace significantly detailed use cases will help tremendously.

dlongley commented 4 years ago

@ewelton,

the content is encrypted (client/consumer/agent/wallet side) and stored once using the CEK

Yes.

the sharing of the data is based on sharing the CEK with other systems - note - this is not "simple" sharing, like "email me the CEK", but ultimately, since every consumer of the data decrypts the data they ultimately have the key - which could be shared on "revenge cryptography" sites like "watch-my-cek.com"

Ultimately, if you have the CEK and you have the encrypted data, then you can decrypt it. The sharing process described above actually involves more -- but at the end of the day, yes, once you have the encrypted data and the CEK you can decrypt. Of course, once you have the cleartext you could also just share that! So I'm not sure why the "sharing" is being described this way. I'm not sure if you were trying to boil things down or if the reflection just missed some details (there are plenty here, no worries if so!).

The proper way to share is:

Grant authority to access the encrypted data.
Grant authority to use a KAK for which a KEK can be derived to unwrap the CEK. Note: If the CEK is already wrapped by a KEK derived from a KAK the targeted recipient can use, this step can be skipped.

Steps 1 and 2 can be implemented using zcaps as described above. If there is no way to grant authority to a KAK expressed as a recipient in the encrypted data, then, yes, an additional recipient will need to be added to the encrypted data by unwrapping the CEK and additionally wrapping it using a KEK derived from a KAK that the recipient already has access to.

key sharing is handled by something like my original strategy '2' in the OP - e.g. I make one CEK encrypted copy of the data, but I have a different bolus for each authorized user - but, because i expect revocation to be messy, as a practice, when I revoke access I will typically "re-encrypt" the data with a new CEK and then share the new CEK via the outlined strategy

See my comments above regarding sharing the CEK. The wrapped CEK travels with the encrypted data; you should not be sharing it directly. The encrypted data includes a "recipients" field that expresses the CEK where by it has been wrapped (aka encrypted) using a KEK that can be derived from a KAK that is identified and an ephemeral key that is expressed. So each "recipient" has a wrapped CEK, ephemeral key information, and a KAK identifier. All of this is part of the JSON Web Encryption (JWE) standard.

Note that re-encrypting the data may be wholly unnecessary when revoking access -- as if the recipient has already read the data, you can't "get it back". People can't "unsee" what they've seen and they can make copies of the cleartext, etc. Yes, it could be re-encrypted, but people should be aware of how little that may actually accomplish. But, each time the data changes, of course, a new CEK should be used to encrypt it.

dlongley commented 4 years ago

@ewelton,

In general I really like this solution - but it is not clear to me how WebKMS solves the problems. KMS is out of scope for the specification, according to the ecosystem diagram and the current spec.

WebKMS can solve problems despite it being out of scope -- so I don't understand this comment.

Also, I do not see how WebKMS solves the "access to the private key material" ...

There's a fundamental difference between giving someone access to private key material that they will then forever possess and giving them a capability to use a key; the former is not revocable (or expirable). That means that if you add a "recipient" to an encrypted document expresses a KAK that has been shared directly -- whomever has had direct access to it will always be able to decrypt the payload. If, instead, there is a layer of indirection, whereby you only share a capability (or a "function" one can call) to perform secret derivation -- you can also revoke access to this capability. So WebKMS adds an additional feature: "revocation of access to the KAK" -- something that cannot be done if private key material is shared directly.

This feature means that you can encrypt many documents in an EDV with a single "recipient" (a single KAK) and then manage those who are able to decrypt those documents in an entirely decoupled layer. There is no need to re-encrypt all of the data when access is revoked and whenever new data is encrypted; the same KAK can be used because those who have had their access revoked cannot read the newly encrypted data. This makes sharing and key management significantly simpler.

ewelton commented 4 years ago

@dlongley

In general I really like this solution - but it is not clear to me how WebKMS solves the problems. KMS is out of scope for the specification, according to the ecosystem diagram and the current spec.

WebKMS can solve problems despite it being out of scope -- so I don't understand this comment.

If the properties of WebKMS have architectural and design impacts, those should be spelled out - and if properties of a KMS system are significant, then the scope should be amended. This serves a communicative function - we need to pull the tacit information out into the light and make it explicit.

for this part

Also, I do not see how WebKMS solves the "access to the private key material" ... i think you do not understand the problem.

The concern is not whether or not you share a private key - not at all. It is whether or not, in the ecosystem diagram, there is a box with properties that are in scope and this relationship is called out. I understand the two paragraphs and I agree with them - but it is about a different concern than the one that troubles me.

One way that I would look at this is amending the ecosystem diagram - if WebKMS plays the KMS role in the ecosystem diagram, and the SDS is communicating directly with the KMS without any interloping Agent/Wallet code brokering the connection then there should be an arrow. There is not, so 100% of what you said is irrelevant to the spec as it stands - if this were a court-room and I were a judge I would rule it is "inadmissible" and in order to make it "admissible" then the ecosystem diagram needs to be updated.

This is because, right now, the only way for the SDS to access the KMS is to talk to Agent/Wallet code - which keeps that in the loop and puts a requirement on the Agent/Wallet API to be contacted by the SDS. If the Agent/Wallet is in play all the time, then it is fair to include it in more than just this - for example, it can be involved in the authorization and consent negotiation - in fact, I would argue that the "minimal required Agent/Wallet" API for this would include a WebKMS like API.

I hope that clears up the concern - in this case it is not about whether the key is exposed, it is about whether this discussion is congruent with the spec as it stands. I claim that it is not, and that an update to ecosystem context diagram and text is in order. I will be happy to offer the PR once issue #47 is resolved - i mean, I support this strategy as a technical strategy - but we first need to lay some groundwork in the spec before we can pursue it robustly.

ewelton commented 4 years ago

@dlongley

Note that re-encrypting the data may be wholly unnecessary when revoking access -- as if the recipient has already read the data, you can't "get it back". People can't "unsee" what they've seen and they can make copies of the cleartext, etc. Yes, it could be re-encrypted, but people should be aware of how little that may actually accomplish. But, each time the data changes, of course, a new CEK should be used to encrypt it.

Yeah - that answers my concern - a good detail to bring into the spec - an appendix or implementation guidance, etc. Not sure where that belongs, but it should be called out.

And you are right - I was considering a subscription model where one is using notifications to monitor changes. Let's say I have datum X, and I have subscribes A,B,C - and assuming there is some cost to encryption (money, or resource, or time) - in that case

S1 : t=0, X_0 is visible to A,B - encrypted by CEK_0
S2: t=1, X_1 is visible to A,B,C - encrypted by CEK_0
S3a: t=2, X_2 is visible to A,C - encrypted by CEK_0
S3b: t=2, X_2 is visible to A,C - encrypted by CEK_1

considering S3a vs. S3b.... in the ideal case, both would be fine - because B would never be able to even get the encrypted payload. However, S3b is "slightly stronger" in the case of a compromise where somehow B gets access to the raw data - because they could not decrypt it using their old key.

This is something to bring up in terms of risk analysis - e.g. imagine the situation where there was an error say accessing an Agent/Wallet or WebKMS to perform the encryption - and assuming that there is some cost to broadcasting the notifications to the subscribers e.g.

S1 : t=0, X_0 is visible to A,B - encrypted by CEK_0
S2: t=1, X_1 is visible to A,B,C - encrypted by CEK_0
- effort was made to migrate to CEK_1, but could not access KMS
- absolutely no difference in exposure, can retry opportunistically
- no notifications of updated material need to be sent
S3: t=2, X_2 is visible to A,C - encrypted by CEK_0
- effort was made to migrate to CEK_1, but could not access KMS
- data security rests on preventing access to data by layer 1 authorization & validation services
- must retry until success

I really like this model - i hope we can bring it into full light in the spec.

dlongley commented 4 years ago

@ewelton,

One way that I would look at this is amending the ecosystem diagram - if WebKMS plays the KMS role in the ecosystem diagram, and the SDS is communicating directly with the KMS without any interloping Agent/Wallet code brokering the connection then there should be an arrow. There is not, so 100% of what you said is irrelevant to the spec as it stands - if this were a court-room and I were a judge I would rule it is "inadmissible" and in order to make it "admissible" then the ecosystem diagram needs to be updated.

An EDV/SDS does not need access to private key material, therefore, it does not need to communicate with the KMS. I'm not suggesting that WebKMS be added to the spec -- I'm indicating that other pieces of software and specs can make managing the sharing of access to encrypted documents easier. I'm saying that the EDV/SDS spec provides the plumbing necessary to enable other layers and technology. Different people can make different choices at those layers -- and other specs, such as WebKMS, can offer further interoperability at those layers. It would be helpful for the EDV/SDS spec to mention this informatively -- and to highlight that the EDV/SDS spec enables these other technologies at other layers.

Note: I haven't ignored your comments regarding all of this "layering" I'm mentioning. We'll have to figure out a way to be clear about whether we're talking about layers internal to the EDV/SDS spec or external to it. Also, I'm referring to the spec as EDV/SDS because we still need to vote on naming the thing (#35).

I hope that clears up the concern - in this case it is not about whether the key is exposed, it is about whether this discussion is congruent with the spec as it stands. I claim that it is not, and that an update to ecosystem context diagram and text is in order.

I don't think there should be an "additional arrow" connecting the EDV/SDS with the KMS. Sharing access has to do with the private aspect of keys, not the public aspect -- whether that is by directly sharing private key material or sharing it via some indirection. A EDV/SDS "server" can be implemented such that it is only concerned with public key material (and this has to do with authorization, not encryption -- as the "server" does not participate in encryption at all). I agree that the spec would have no need to make any normative statements about the use of a particular KMS -- rather, it could informatively mention that the EDV/SDS design enables other specifications (such as WebKMS) to be used to enable easier key management and, therefore, easier sharing of access to encrypted data.

dlongley commented 4 years ago

I really like this model - i hope we can bring it into full light in the spec.

ewelton commented 4 years ago

@dlongley

I want to push back a little bit because I don't think that my confusion on this issue is unique to me and that the community as a whole will benefit.

An EDV/SDS does not need access to private key material, therefore, it does not need to communicate with the KMS.

The above makes the assumption that the only request one makes of the KMS is to "give me the keys", which is not what I was thinking about it - In fact, I do not like the way TEE and KMS + biometrics + keys are separated in that diagram - i find that whole part of the ecosystem diagram confusing.

I would argue that "requesting that system-X encrypt/decrypt a payload" is the feature that needs to be documented, because that system has access to the unencrypted data and keys. Right now, that would look like asking the Agent/Wallet to handle that - and that makes a lot of sense. That is how i had previously imagined it - you have an "active component" in the Agent/Wallet

Most generally the spec could say "the EDV/SDS asks software that operates on behalf of the owner, to perform an encryption related operation" - but that is more complicated than saying "ask the Agent/Wallet to perform the encryption" - since it already is the central piece coordinating the TEE, the KMS, and whatnot.

If there is a role in the system for something other than the Agent/Wallet to be involved, then it I believe we benefit from having a box on a diagram. I am curious as to why we would not want that.

I hope that clears up the concern - in this case it is not about whether the key is exposed, it is about whether this discussion is congruent with the spec as it stands. I claim that it is not, and that an update to ecosystem context diagram and text is in order.

I don't think there should be an "additional arrow" connecting the EDV/SDS with the KMS. Sharing access has to do with the private aspect of keys, not the public aspect -- whether that is by directly sharing private key material or sharing it via some indirection. A EDV/SDS "server" can be implemented such that it is only concerned with public key material (and this has to do with authorization, not encryption -- as the "server" does not participate in encryption at all). I agree that the spec would have no need to make any normative statements about the use of a particular KMS -- rather, it could informatively mention that the EDV/SDS design enables other specifications (such as WebKMS) to be used to enable easier key management and, therefore, easier sharing of access to encrypted data.

With the addition of the assumptions about KMS in the ecosystem diagram (perhaps these would be good to spell out when we attend to issue #47) then I agree - it defines the KMS in such a way that the arrow would not be correct - for precisely the reason you mention. The KMS just "hands over keys".

However, I do not agree that having the Storage element involve interaction with boxes absent from the ecosystem diagram is a good strategy. If there is an essential role outside of Storage that is involved in one more more use cases, then we should put the required box on the context diagram and define the relationship. If that is the Agent/Wallet, then we don't need a new box - if it is not the Agent/Wallet, then we do need a new box. In either case, the properties of the arrows needs to highlight the KEK/KAK/CEK dance described above.

dlongley commented 4 years ago

@ewelton,

I would argue that "requesting that system-X encrypt/decrypt a payload" is the feature that needs to be documented, because that system has access to the unencrypted data and keys. Right now, that would look like asking the Agent/Wallet to handle that - and that makes a lot of sense. That is how i had previously imagined it - you have an "active component" in the Agent/Wallet

Most generally the spec could say "the EDV/SDS asks software that operates on behalf of the owner, to perform an encryption related operation" - but that is more complicated than saying "ask the Agent/Wallet to perform the encryption" - since it already is the central piece coordinating the TEE, the KMS, and whatnot.

If there is a role in the system for something other than the Agent/Wallet to be involved, then it I believe we benefit from having a box on a diagram. I am curious as to why we would not want that.

Now I do believe you are highlighting where the ecosystem diagram could use some work. I think we either need to show an EDV/SDS client inside of the "Agent/Wallet" box or have another architecture diagram that helps to better explain the relationship between the EDV/SDS "server" and an EDV/SDS "client". It seems roles and responsibilities are being conflated too much: there's too much confusion here. There is additional confusion around what a "full" SDS is (or what one configuration may be) vs. an EDV. We have at least one working implementation of an EDV client and a couple of EDV servers, but there is an expectation that there may be some kind of additional layer(s) of features that run on top of one or both of those -- or that perhaps use these via composition. Maybe an SDS "server" is an EDV "server" that includes additional services to handle replication and notifications -- or maybe it's something more. This is where implementation experience is lacking so we don't have as much clarity.

Where we do have clarity from implementation experience so far, we have this:

An EDV "server" doesn't "ask" software to do anything. It is responsible for:

Storing encrypted data for later retrieval.
Verifying that whomever tries to write encrypted data or read encrypted data (or perform an encrypted search) is authorized to do so.

An EDV "client" is responsible for:

Encrypting data -- which involves encrypting to and attaching whatever "recipients" are requested to the encrypted output and using whatever key APIs are made available to it, e.g., calling a function that will hit a KMS to "derive a secret" during a KEK derivation process.
Making requests to an EDV "server" to write/read/search for encrypted data.

An "Agent/Wallet" makes use of an EDV "client" to write to/read from the EDV "server" -- and has other responsibilities such has helping a user manage who can access which encrypted documents and who can access which keys in a KMS.

I don't think the ecosystem diagram fully captures this today -- and it's leading to a lot of confusion around responsibilities.

The KMS just "hands over keys".

I wouldn't say that -- I'd say that it is capable of executing private key operations. For example, the KMS can be asked to "derive a secret" using a key agreement key it has stored internally. However, it does not need to execute public key operations; these can be performed by any system with access to the public key material.

Note that a KMS operation to "derive a secret" is the only private key operation that is needed as part of the decryption process (other KMS operations like performing digital signatures are useful if you're using zcaps, but that's another subject). Once a secret has been derived, the rest of the decryption process only involves key material that will either be computed or come from the JWE representing the encrypted data. Furthermore, deriving a secret when encrypting only involves a private ephemeral key, so it also does not involve a KMS.

However, I do not agree that having the Storage element involve interaction with boxes absent from the ecosystem diagram is a good strategy. If there is an essential role outside of Storage that is involved in one more more use cases, then we should put the required box on the context diagram and define the relationship. If that is the Agent/Wallet, then we don't need a new box - if it is not the Agent/Wallet, then we do need a new box. In either case, the properties of the arrows needs to highlight the KEK/KAK/CEK dance described above.

We do need more diagrams explaining where encryption/decryption happens -- and we need to highlight that it isn't in the "storage" box -- if the "storage" box is meant to represent the "server". This is perhaps the crux of the tension here; whether "storage" is meant to represent the "server and the client" or only the "server". This also seems to mix roles with architectural components -- which leads to more confusion around the meaning of "server". Either way, I don't see how "storage" can reasonably represent both roles or components without creating the problems mentioned above.

OR13 commented 4 years ago

This issue almost seems like a dupe of https://github.com/decentralized-identity/secure-data-store/issues/47

I think we are ready for review improved diagrams, if anyone has to propose some revised ones.

OR13 commented 4 years ago

I think this is a request for better diagrams, thats why i labeled it ready for PR.

dmitrizagidulin commented 4 years ago

Should be addressed by PR #88.

OR13 commented 3 years ago

Looks like we got this fam.