decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation
https://identity.foundation/confidential-storage/
Apache License 2.0
80 stars 23 forks source link

Proposal for minimal changes to EDV Data Model to support Hubs #97

Open OR13 opened 4 years ago

OR13 commented 4 years ago
  1. JWE headers may contain synchronization primitives (vector clocks / hash links / counters, CRDT_TYPE)
  2. A new HTTP endpoint /sync should be exposed, and it should leverage the JWE headers to return a result set of EDV Documents.
  3. EDV Documents with JWE Headers including CRDT_TYPE: AutoMerge contain AutoMerge deltas.

Developer user story

  1. Developer creates a new Vault for Music
  2. Developer creates a new AutoMerge Document Music Playlist with ID 123
  3. Developer shares access to Vault with 3 Other Applications (Tablet , Phone, Home Server)
  4. Applications construct and submit AutoMerge Deltas as Documents.
  5. Developer calls /sync on tablet....
  6. Hub Client on Table uses EDV client to pull down all deltas for ID 123, and applies auto merge and returns a JSON object containing changes from 3 devices with no conflicts.
OR13 commented 4 years ago

@csuwildcat @dmitrizagidulin @msporny @dlongley @tplooker

I'm not sure how familiar you all are with automerge.... but the above structure works... I have tested for it both using JWE/JWS and a few CWS/CWE experiments.... with these proposed changes accepted, we could construct REST interfaces for Hubs, and a Hub Client on top of the existing EDV infrastructure.

csuwildcat commented 4 years ago

What if the user in this scenario wants to grant multi-recipient encrypted access to CRUD some subset within that Music vault to External Party A and another subset to External Party B, and those subsets of objects in that Music vault overlap. I am trying to understand how this structure deals with a sea of Venn unions of data encryption and access within a given target set of Music. Hopefully encryption and permissions are not set vault-wide? I would love to avoid data duplication and other ugly deoptimizations, if possible. (not saying those exist in this scheme, just trying to understand how you are thinking of handling this stuff)

OR13 commented 4 years ago

@csuwildcat Here is a JWE

{
    "protected": "eyJlbmMiOiJYQzIwUCJ9",
    "recipients": [
      {
        "header": {
          "kid": "did:key:z6Mkf8unjmyqsnDtZAjZkdNhw3LZWm5x9u3bbHCEdenD1Agq#z6LShX3PmBwYHGh8JL82zm3x8uT3bWEbLmfos66McREoEfvo",
          "alg": "ECDH-ES+A256KW",
          "epk": {
            "kty": "OKP",
            "crv": "X25519",
            "x": "pK5QE4-dwpPdjejlB3VERU9XCy1t4xfa-JNUDVa9iVs"
          },
          "apu": "pK5QE4-dwpPdjejlB3VERU9XCy1t4xfa-JNUDVa9iVs",
          "apv": "ZGlkOmtleTp6Nk1rZjh1bmpteXFzbkR0WkFqWmtkTmh3M0xaV201eDl1M2JiSENFZGVuRDFBZ3EjejZMU2hYM1BtQndZSEdoOEpMODJ6bTN4OHVUM2JXRWJMbWZvczY2TWNSRW9FZnZv"
        },
        "encrypted_key": "DwOEbW0OvtnQaqL4gc6_9Za1vzHrrLptI_UsPsGWFoBlUcASWP5qWQ"
      }
    ],
    "iv": "Et_yCe5BAWtSiAm2H3GEh192zNQiNA4d",
    "ciphertext": "WC9zeH_Q90Z34VvX7Vsb2nK42qjZch2n-x2RweSjmVyVOxu__yAY870u5sRtaOjTPSNtxKoxHFNTbsVW2M5vlXTPStNtxdcGK8s2qPI_diR8E3E3pzqKr8iShZ2c3wuywILcgZWrdYlmzW9tcdBjLAnBdxbWdhqxwZNKLIu-11edpXA0KOra8qhK55mI8k_WUDTudV1w7aYVPFtngCwNy1hN4JsAGm1_NtB_WpXtua10oQ-PpP6d18i7c3jYCMZ56oaGCn5I1hf3yCO2OKgVJhxCsA2LzAu9gKxSm9ZPjhqrK5iRXUaE4lLWZNahgf_MRiNn5MDp7sN0GJ4IJFTs2On0_W6llwWgttkNiqtcsx48PiwlKgO2oimB0L7Y-bVpcinCpfDCK-UG6FGKaw7f1HsjWo4uthHdnCOm_Hw8dsSc7IPh0cORg4qbtAS4l_HDbPQroMlJIuLeOZqwMT55Ux32f3IfeVP5_1qitnigamOgHsfjuAV6ttKEgsEiDoAqa7kOQy_pB5jXkkJ57FURfKSG__hkbzm2L88djfaDAFFAz-7W0LvaEM4Dwew_-kAnoDJCBPa5MPG4W7MpXZhiafIuZsaD_Xk9OprHxFV_nXU8ztl0NKoc_H3Qg3l1D00wJQI_TWPqRfSqc5qHyRrh_TLRuTXpK2I2Hh3v-N_HrNotWN8p-McFnaV3cRtOMvLq44kF_X4_NPH8s7wYQ4yFkd2ffiFD",
    "tag": "TSiAPqHT6t0wT1rWJppicQ"
  }

If we are smart about JWE encoding, we can convert this to a content addressed URI, like so:

https://example.com/content/(CID of ciphertext)?jwe_meta=<encoded meta data>

Where encoded meta data... contains recipients:

 [
      {
        "header": {
          "kid": "did:key:z6Mkf8unjmyqsnDtZAjZkdNhw3LZWm5x9u3bbHCEdenD1Agq#z6LShX3PmBwYHGh8JL82zm3x8uT3bWEbLmfos66McREoEfvo",
          "alg": "ECDH-ES+A256KW",
          "epk": {
            "kty": "OKP",
            "crv": "X25519",
            "x": "pK5QE4-dwpPdjejlB3VERU9XCy1t4xfa-JNUDVa9iVs"
          },
          "apu": "pK5QE4-dwpPdjejlB3VERU9XCy1t4xfa-JNUDVa9iVs",
          "apv": "ZGlkOmtleTp6Nk1rZjh1bmpteXFzbkR0WkFqWmtkTmh3M0xaV201eDl1M2JiSENFZGVuRDFBZ3EjejZMU2hYM1BtQndZSEdoOEpMODJ6bTN4OHVUM2JXRWJMbWZvczY2TWNSRW9FZnZv"
        },
        "encrypted_key": "DwOEbW0OvtnQaqL4gc6_9Za1vzHrrLptI_UsPsGWFoBlUcASWP5qWQ"
      }
    ]

which can then be added to or removed without the content id of the cipher text changing....

Now to the question of "who gets to access cipher text".... if your CID system is IPFS and you are on the public internet.... everyone!

if you are on private IPFS, and you modulate your peer set to logically correspond to JWE recipients.... thats how EDVs work today....

So if the Documents are JWEs with special headers.... and they contain AutoMerge Deltas... and the peer set / authorization set is controlled by the storage provider, and the storage provider is honest (modulates peers according to the preferences described in the JWE headers).... then I believe thats everything you are asking for....

Can you refine your question further now?

dlongley commented 4 years ago

Important side note on vector clocks: http://pl.atyp.us/wordpress/index.php/2010/03/conflict-resolution/ ... and note that automated conflict resolution in open systems is an even more challenging problem than closed ones. Our work here, of course, adds the additional complexity that we want to minimize the information the server knows about the data.

We need to analyze the privacy difference between exposing a simple sequence number to address inconsistencies that can arise just due to the partition between the client and the server vs. exposing "automerge deltas" in some way that is intended to address more complex synchronization concerns across servers.

OR13 commented 4 years ago

agree, we should discuss how sequence numbers are used, and their relationship to indexes... my assumption right now is that hubs resources are built on top of edv documents, and that the default strategy of "no additional data is needed" is accurate. I am waiting for a counter proposal from @csuwildcat .

tahpot commented 3 years ago

What if the user in this scenario wants to grant multi-recipient encrypted access to CRUD some subset within that Music vault to External Party A and another subset to External Party B, and those subsets of objects in that Music vault overlap. I am trying to understand how this structure deals with a sea of Venn unions of data encryption and access within a given target set of Music. Hopefully encryption and permissions are not set vault-wide? I would love to avoid data duplication and other ugly deoptimizations, if possible. (not saying those exist in this scheme, just trying to understand how you are thinking of handling this stuff)

@csuwildcat This is a really important point and a question I have with the current architecture.

When designing the Verida Datastore, I redesigned the whole system after a few false starts to ensure subsets of encrypted data could be appropriately permissioned across multiple applications and then syncronized in both directions.

agropper commented 3 years ago

It might help for us to understand what we're hoping for as compared to what Dropbox does now. Here's a screen shot.

Note Connect apps... in the lower right where I would authorize access by other apps that I, as the owner, might or might not control.

There are other features illustrated.

csuwildcat commented 3 years ago

@agropper they're expressing the same concern that I have: I don't get a sense that the architecture of the current EDV stuff is really attuned to saying: "I have encrypted 1000 objects spanning hundreds of different types located on a remote instance, and I want to give 100 different entities access to different subsets of those objects, without duplicating the objects, creating folders, or any other form of segmentation. I want to encrypt the data such that I can create a 'sea of access Venn diagram overlaps' over the 1000 object set by simply issuing them a permission secret that contains the access capability + a decryption key that is only usable for decrypting the subset of the objects they are allowed to access" <-- this is what you need for a real-deal multiparty decentralized app datastore, and we need to make sure that's possible. If the foundations don't make that easy, we need to change the foundations, not change our requirements to fit whatever the foundations can't do as of today.

agropper commented 3 years ago

@csuwildcat What you want is very reasonable as a use-case, I just don't know how to think of it in technical terms.