OR13 commented 4 years ago

Raw / Byte / Block Storage

agropper commented 4 years ago

Is encryption assumed at A even if the object is not encrypted.

msporny commented 4 years ago

Is encryption assumed at A even if the object is not encrypted.

Yes, absolutely. The bytes are encrypted at a higher layer... and come into this layer.

agropper commented 4 years ago

Does that mean that anonymous access requires a separate place to publish the encryption key? Where would that place be? In the same place as the Index that points to the resource?

This, sounds like Datashards and it might be ok from my privacy perspective to publish the resource link and the encryption key together.

OR13 commented 4 years ago

Related to Layer A, not only should content be encrypted, but meta data about the content MUST be encrypted to prevent an incentive to sell meta data, or observable properties of data controllers. See https://github.com/decentralized-identity/secure-data-store/issues/79

agropper commented 4 years ago

@OR13 Although I agree that we should discourage surveillance capitalism and platform lock-in, I have a HUGE privacy problem with your MUST statement from the perspective of self-sovereign technology, searchable indexes, and standards that drive decentralization. For example, I want the opportunity self-host my Layer A data while choosing to scatter meta data among three different Layer B indexes and other services. This separation of concerns and easy substitutability keeps both the Layer A and B service providers honest through competition rather than technical means.

What is the meta data at layer A? How is it different from meta data at Layer B? Under what circumstances are access controls different between Layer A and B? Who is allowed to bypass Layer B and access Layer A directly?

OR13 commented 4 years ago

@agropper I think we will cover this on the next call, but I'm concerned that you think a user's ability to control of their data is a privacy concern, because in my view, the opposite, or any leakage of data or meta data without Alice's direct consent is the true privacy concern.

Lets say Alice has a prescription, and choose to store it in a vault.

That does not mean that Alice does not also disclose that prescription elsewhere, or build indexes on it elsewhere... but it does mean that the vault provider does not know that Alice has a prescription or meta data that might allow for the vault provider to easily guess that... and then sell the approximate number of prescriptions across all users in the vault or other impression / aggregate demographic statistics which can be built from plaintext meta data.

Encryption is fundamental to privacy, disclosure of plaintext is the mechanism whereby Alice exercises control over her data.

Alice control of her data in a vault, does not prevent Alice from choosing to disclose data in other systems, or though other services.

Are you suggesting that a vault provider should be capable of disclosing data or meta data about Alice without her consent (in digital form) that is stored exclusively in her vault?

agropper commented 4 years ago

@OR13 We seem to have very different perceptions of the real world. Here are some arguments in no particular order:

Forcing data vaults to be encrypted is useful to the extent you believe in DRM.
Making encryption optional does not prevent Alice from encrypting everything if she chooses.
Optional encryption means we have to pay attention to the authorization designs. That results in a more resilient system overall.
Encryption is a poor substitute for access control. It forces Alice to keep writing keys to the vault as different users seek access and as access rights expire. This is even worse when scoped access is available. Having the storage provider enforce scope may be preferred from having the authorization server enforce scopes through encryption.
Alice's best protection, privacy by default, is to leave the data where it originated. In the case of a prescription, that is with the hospital system or her self-sovereign health record and having it be accessed by reference. (HIE of One demonstrates both storage that respects Alice's choice of authorization server and storage that is self-sovereign to Alice.) Our legal systems in EU and US still allow unlimited copies of data about us to be made and aggregated for all sorts of reasons. I'm hoping for technology that treats data more like private keys and avoids copies by design. ZKP and homomorphic encryption leave the data in place and just control access to the result.
Let's avoid making persistent copies of data. Showing your driver's license to the bouncer builds on signature tech, not on encryption of storage.
It's bad practice to store metadata with the same entity as the data itself. I want to be able to "rotate" the index and the store operators independently as easily as possible to avoid lock-in and keep them honest.

In summary, I'm suggesting that SDS needs to be built on state of the art access controls starting with TxAuth, Ocap, and compatible with content-addressable networks. As you all can tell by now, I'm not a cryptographer. To me, EDV is a hammer looking for nails. Encryption is essential for security. It's a blunt instrument for privacy in almost every way.

dlongley commented 4 years ago

@agropper,

Optional encryption means we have to pay attention to the authorization designs. That results in a more resilient system overall.

I disagree -- we have to pay attention regardless. Encryption has a shelf life.

Encryption is a poor substitute for access control. It forces Alice to keep writing keys to the vault as different users seek access and as access rights expire.

I agree that encryption is insufficient for proper access control (see above). However, Alice is not forced to do what you've said here. There are a number of ways to control access to cleartext (if given access to the ciphertext) without writing additional keys to the vault including, for example, using an external KMS that can invoke key agreement keys for authorized parties. The JS "edv-client" implementation supports this approach today.

OR13 commented 4 years ago

There appears to be a fundamental misunderstanding with respect to encryption, user control over data, and authorization here.

Encryption happens before authorization, because if you do it the other way around, your solution is fundamentally incapable of protecting data from unauthorized disclosure.

A user leverages keys they control to encrypt messages for one or more recipients. This is how GPG works, it's how didcomm, works, it's how TLS works, and it's how EDVs work today.

Transports and permissions models built around them (such as TxAuth over HTTP, or HTTP Signatures, or OAuth) happen after encryption.

If you don't start with encryption where you control the keys.... you will never have control.

agropper commented 4 years ago

@dlongley,

There are a number of ways to control access to cleartext (if given access to the ciphertext) without writing additional keys to the vault including, for example, using an external KMS that can invoke key agreement keys for authorized parties. The JS "edv-client" implementation supports this approach today.

Let's say Alice's prescription is one of a thousand leaves of a hundred branches (prescriptions, labs, demographics, encounters, ...) in her health record and that's the scope of access being granted to a pharmacy. How would key agreement keys represent that scope?

dlongley commented 4 years ago

@agropper,

Let's say Alice's prescription is one of a thousand leaves of a hundred branches (prescriptions, labs, demographics, encounters, ...) in her health record and that's the scope of access being granted to a pharmacy. How would key agreement keys represent that scope?

They don't. The authority to use a key agreement key enables you to decrypt data that was encrypted with a key derived from it. That's it.

What you're talking about is granting access to a particular piece of cleartext information. But this is skipping the layers involving encryption. No system should even be able to see any of that cleartext information without first being given explicit authority to access to the ciphertext. Once the ciphertext is obtained, it must be decrypted, which can only be done if one has the authority to invoke a key agreement key. Then, and only then, can a system access the cleartext -- at which point additional access controls can be put in place by that system to grant access to some portion of that data to another system.

So, in terms of systems and authority:

System A has the authority to see the ciphertext.

System B has the authority to decrypt the ciphertext. This system can see all of the cleartext if given the ciphertext. Note that system A and B need not be the same and they can be separated based on a particular risk/trust profile.

System C only has the authority to see a portion of the plaintext. This system cannot be the same system as B. It may or may not be the same system as A.

agropper commented 4 years ago

It may help this discussion to point to some real-world privacy use cases that try to address the number of entities that store personal data. I've written about proposed legislation designed to address issues with current privacy laws like GDPR, CCPA, and HIPAA.

Although creating a few regulated honeypots for dossiers about each of us is hardly decentralization, here's a proposal to control surveillance capitalism by limiting the number of entities that store personal data.

Add to this list of real-world use cases calls for Data Trusts, MyData Data Commons.

It's also worth considering the assumptions behind Trust Over IP. Note that the diagram does not address the number of aggregator services. Does this represent a real-world solution to surveillance capitalism or is it an example of trying to shoehorn our precious SSI work into broad storage concepts by inventing layers of governance that don't exist today?

bumblefudge commented 4 years ago

hey guys, I think we're getting a little far from the specifics here and I get nervous when people start talking about definitions of privacy or axiomatic pronouncements about always-wrong ways of doing things!

@agropper I like the link you provided, although I'm not sure it's the best place to start-- maybe we can circle back to it after we've found a little more common ground in terms of authorization and layers? I'm still getting a distinct feeling like "authorization" might happen in two different places, depending on how you define it. It feels to me like there's a misfit between the scope on which ACCESS defines portability and competitive interchangeability and the scope on which the A/B/C/D/E model is coming into focus.

@dlongley your breakdown of systems A, B, and C really helped me. Could I maybe ask you to rename them 1, 2, and 3 and then explain where in Orie's A/B/C/D/E each role lives?

My perhaps naïve hope is that if everyone gets pedantically explicit about tagging every single sentence of every github comment with an (A) or a (B or C) before the final period, we might realize we're actually talking about different things some of the time, and make it out of the layering-discussion purgatory a week or two earlier, with less conflict along the way!

agropper commented 4 years ago

@bumblefudge There are many ways to start our adventure:

@jmandel seems to argue for a separation between PDP and PEP layers
the ACCESS act is just a real-world use case that might help explain the risk of some layering choices
or we can apply the DID core spec perspective to the layers

Also, we could decide that storage of (private) keys to serve various "wallet" use-cases is really a separate conversation from storage of playlists to serve "general app" developers. In that case, we would fork this SDS discussion and probably reach consensus faster by working the two in parallel.

jmandel commented 4 years ago

I'm not sure that separation is what I'm arguing for. But I'll try to do my homework and figure out whether I am. There's a lot of ways to map out this conceptual terrain!

dmitrizagidulin commented 4 years ago

@agropper Layer A is below (is not concerned with) the elements that you mention (Policy Decision Points and Police Enforcement Points). Think of Layer A as specifying a hard drive's blocks for an encrypted file system.

dlongley commented 4 years ago

@bumblefudge,

System A (can only see the ciphertext), now "Role 1", lives at layers A, B, and/or C -- perhaps conceivably D depending on the definition of "replication". System B (can decrypt the ciphertext), now "Role 2", lives at layers C, D, and/or E. System C (can only see some portion the cleartext), now "Role 3", generally lives at layer E.

dmitrizagidulin commented 4 years ago

Proposal: we name Layer A as "Encrypted Chunk Storage". Which highlights several things - it's not just byte storage (which is really the layer below this one), but encrypted byte storage. And also highlights the fact that, over a certain threshold, we MUST store files in chunks.

OR13 commented 4 years ago

Hypothetical Interface for Layer A

"Untrustworthy Byte Storage"

Data Model

{
  id: string,
  block: Buffer
}

Interface

Write

let id = '123';
let block = Buffer.from('encrypted data');
layerA.set(id, value);

Read

let id = '123';
const value = layerA.get(id);

Notice that you could implement this interface with IPFS / DataShards / Amazon S3 / Azure Blob Storage / Mongo DB / etc....

bumblefudge commented 4 years ago

System A (can only see the ciphertext), now "Role 1", lives at layers A, B, and/or C -- perhaps conceivably D depending on the definition of "replication". System B (can decrypt the ciphertext), now "Role 2", lives at layers C, D, and/or E. System C (can only see some portion the cleartext), now "Role 3", generally lives at layer E.

@dlongley This is very helpful to me, thanks! If you'll allow me to continue my pendatry, does this mean that:

the Role1/Role2 interface could be A/B OR B/C,
the 2/3 interface could be B/C OR C/D, and
the 3/4 interface could be D/E ? I guess what I'm really asking is this: are we imagining layers so distinct and the interactions between layers so defined by protocols like Orie's hypothetical one above that no two of the five have to be operated by the same party? OR, setting the bar a little lower, is it at least possible to have any two adjacent layers operated by different parties/systems? If it's too early to answer this question, I'm happy to return to it when I understand all five layers a little more! It just sounds like you might be describing a much stronger commitment to modularity than is expressed in "1.5.1 Layered and modular architecture", which might help us to discuss at some point. I'm less concerned with whether all possible configurations map cleanly to the PDP/PEP division of labor if at least one possible, compliant configuration does.

agropper commented 4 years ago

I’m having a hard time understanding this discussion of storage features in terms of SSI. Replication, integrity, search, and sharing are general storage features mostly unrelated to SSI.

Put another way DID could interact with storage features either as a type of authentication (DID Auth) or as a service endpoint in a public DID Document ( https://github.com/w3c/did-core/issues/324 ).

Are we talking about layering of storage features in terms of DID auth or service endpoints? If it’s just auth, then I assume our storage layering discussion applies just as well to user name and password and we’re just having a discussion that’s unrelated to SSI.

On the other hand, if we’re focused on service endpoint types in a public DID document, then auth happens elsewhere and the layers are just related to service endpoint type.

On Sun, Jun 21, 2020 at 1:27 AM By_caballero notifications@github.com wrote:

System A (can only see the ciphertext), now "Role 1", lives at layers A, B, and/or C -- perhaps conceivably D depending on the definition of "replication". System B (can decrypt the ciphertext), now "Role 2", lives at layers C, D, and/or E. System C (can only see some portion the cleartext), now "Role 3", generally lives at layer E.

@dlongley https://github.com/dlongley This is very helpful to me, thanks! If you'll allow me to continue my pendatry, does this mean that:

the Role1/Role2 interface could be A/B OR B/C,

the 2/3 interface could be B/C OR C/D, and

the 3/4 interface could be D/E ? I guess what I'm really asking is this: are we imagining layers so distinct and the interactions between layers so defined by protocols like Orie's hypothetical one above that no two of the five have to be operated by the same party? OR, setting the bar a little lower, is it at least possible to have any two adjacent layers operated by different parties/systems? If it's too early to answer this question, I'm happy to return to it when I understand all five layers a little more! It just sounds like you might be describing a much stronger commitment to modularity than is expressed in "1.5.1 Layered and modular architecture", which might help us to discuss at some point. I'm less concerned with whether all possible configurations map cleanly to the PDP/PEP division of labor if at least one possible, compliant configuration does.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/secure-data-store/issues/80#issuecomment-647081116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABB4YMH2P4EGV3XWTTQI2TRXWK2TANCNFSM4N3YARIQ .

OR13 commented 4 years ago

@OR13 to open a PR that addresses this issue with spec text.

decentralized-identity / confidential-storage

Alphabet Proposal Layer A #80

Raw / Byte / Block Storage

Hypothetical Interface for Layer A

Data Model

Interface

Write

Read