decentralized-identity / confidential-storage

Confidential Storage Specification and Implementation
https://identity.foundation/confidential-storage/
Apache License 2.0
79 stars 23 forks source link

Debate: replication at Layer 1 is implicitly required #52

Closed csuwildcat closed 4 years ago

csuwildcat commented 4 years ago

My assertion: You must fundamentally bind things like vector clock values and other coordination bits into an object (or subsequent mutation thereof) to converge changes in a secure, masterless, eventually consistent way. If you do not, those critical convergence values are not securely and inextricably bound to the payload content, which means replication and convergence itself cannot be trusted.

dlongley commented 4 years ago

Well, I think it's one thing to ensure that L1 includes data model elements that enable replication (I agree we ought to do this) and another to say that replication must be done at L1.

OR13 commented 4 years ago

I think in order to really discuss this, we need to see what a layer 1 object looks like... This is one representation or an edv vault (the bucket which contains documents) today:

https://example.com/api/v1/edvs/ee679cf0-31cf-41a8-9c39-5c5330cca022

{
  "config": {
    "controller": "did:elem:ropsten:EiD7eyIuOizDCcyoo0dsndfjz83N35-ooF5cxpcjBYAm_g",
    "hmac": {
      "id": "urn:mockhmac:1",
      "type": "Sha256HmacKey2019"
    },
    "id": "https://example.com/api/v1/edvs/ee679cf0-31cf-41a8-9c39-5c5330cca022",
    "keyAgreementKey": {
      "id": "did:elem:ropsten:EiD7eyIuOizDCcyoo0dsndfjz83N35-ooF5cxpcjBYAm_g#XIGsdfiFe971SVNaCLHevv5hjwfFAADQmmDDVN3lU_g#zC1Rnuvw9rVa6E5TKF4uQVRuQuaCpVgB81Um2u17Fu7UK4",
      "type": "X25519KeyAgreementKey2019"
    },
    "referenceId": "primary",
    "sequence": 0
  },
  "documents": {
    "z19jGn3p9Ag6ASp9fmaNZvmC5": {
      "id": "z19jGn3p9Ag6ASp9YmaNZvmC5",
      "indexed": [
        {
          "attributes": [
            {
              "name": "t2O3VgcSqWzpI6PDQybxiAmXUrepFB48PCoyyUafrZk",
              "value": "8_TLG_hSsWYN2dvTJVwtAwRvpf9z8Vo-zMrAPK7YAZo"
            }
          ],
          "hmac": {
            "id": "urn:mockhmac:1",
            "type": "Sha256HmacKey2019"
          },
          "sequence": 0
        }
      ],
      "jwe": {
        "ciphertext": "3SHQQJajNH6q0fyAHmwpn5bzetcC8tbwvTIWnQOUOFpHKupSsR3QxNfASahcpRrusAziHonHwMkOXUiFn3XiN3KZVMFOslRziJBTO42ip-2Ozb3lHGkZvB-F-A08zRe9FY1zVkuPwAUay1u5-Gm_tpnYt1zwD7eYTyPX3qHFTm62_JWC9edAfDw7zQx6U9-yoxRHOxh5rrOLDMkaOUI-5-icidE15ls1W-hCV4KCFxdF3p1EZ8nDsGoPZ-POfm8FiwPzZaGMmCV8HlvqwPLlpxOw2snXcfYLe6HJejgelHL_JnMeoXltl7xTiOJ6_hXss_8GZ7rKI1FRymzSqlYNiF5QsYCe4fNCANGVlePQlvWCqa1Fdgw4UkfpUO-sHfOtf8alVhtuBrUzvjKnrgJLhl2BXwA8i2Kh-nOxJ9w6iCOr6N5QC2Oet4fWCIqSpNhOl4-2Rb-Kc8z0tKqKfapmee6nIiAxFUiKy8-cr1EbXksFgymUf7knlqg0j_TNv_uIwSegL0hmRMfsVNq1OwKofCgKFpyvf-t3qSRTmHnisXkZCpOy7t6gNs9pul0fp9JojM--pVxUUouDd1ubgZvdHbLUBQsdEjnQznR9BgosiJycRHy1xJGVdq1lYp_k2aV7SsThXwvAs6-jVkIad5F3KwuAjTD4Yz4HzDWZqIRWbK4ctUJp_OORDVlsdrZPJ17Pt_I1Pg36aG2X4mHMn-ZLu6_vzlt62lzftTZ7eTDNdsPUmq19s3bncPqYaC6_E6Fq8ZIryN-tOBbxqj6K29KsTcWr4Mfgdqu_u3akBJedEFslMaBPn2yTxS7sbAalIODnn4VnhzGqxaB9ksu6ZQpt4_Z0Qgj1FgoR0I767ZINHC3D6X0V6PfZa6-qTcczKH7f_bGMh8ZGb6c650oYar2XaFDl5aAJT5_tHlGozmhzusD2hvSJkP5mllkeaz_Y_YJV0w_VkjaHRFl-HT3S5pLY3YQr93hUTfmpIWRoTnE3D_cebFYCMrsslJPfDAKG-BGGGTlQffa41uqYCfS2OyFMwxwcSaIblak7FyKvaebR6vi38nw-wMi8O8OjDYv4iwrcIflRFbJhoZ3J48e91pKPiGMHSRNNrhea5WNFGzsdYQQkqFAI7gwOlIYbH7hqgMZaNfHb87oKLeVokhMbZXZNXO5Fc0OnuinvDY0TZEz0Dk06AkLYYx1ncIWUxu6KiI4m5wIfbTl-m7zij3qIzCMqmEhGnE-hsv0o5VfUTJ7_-WpnmMnO5YkzyB1InbYgoP7vkqcy84qf-mvc1bKTg1tod4HFPEc9eDlCMnjS8y1pWfG_dcwDEB9-9f8MGyR4jYrWPoIFPV1u3R2yHCWmd2YZMCN5cQWFfgFkASX4Ag2F9Myk_BJobQf0nXgDhgaInE7Kw_KISmeQYxLfSUaKblOgKGZngzvHJRAIpFEv7BG1_0xc50d_M8XR7F7RD814AL1HDCKIqzKHZhx3lrzO_CxQaxD87iXHfy88ne1tf0cTggttSvM7tP_fWKJhoeoilvGNpQ53Sfq5-GYGhv2VvI90ixsmGjbS56UCjL2QG6VHV0eYN603m2t5WbwhTtSkAft3jaq2a6zqOF9_g77cqv4OySQFHlek-R1cy1Uf85DlwauWvL3DJ4tIrBI8BdWxZxZaHY3-2cWaZDIMUrzaJQKuA_ckus8Qt711F1lzzyes33KIcCY9hmnWea7yUFmrKeGOazjR7jWqinB0wRdSmjMGBEC-FBmtZMhxqRyL37iJyWWtONXexeHt2r89rCDZkZ1yWZlbTo1Ij-FDGIV8uXpLg3j8LHeEmabKMYnN83uWP1CHWQ5V7-0axcGbGfadVovQBXfQZaFYlensTKIqjo1TN0s9ABHZxlAMIZVEoRvsjMPl4guF1W1nxXWr3ehZNnIugGYiDqNa0FBSEOTg6YW3TPOHxUtf1lva_2O0I1Gfp1chtLRpPHFWAIBeWMDEt14_qV9Qonv-QlS6ynFBl2x83jiZLN4TbJUpU9KJMP8zC0vBtnhkH9fVyUU5qgrlD9hiuFN2Yjldw5wO33FG1re2QFfwV2ifE2U-0Cy1M6mU7MfwFmSYmCuccPk3KnLy-5OpZ8Vfg89BM7h7RUYgR3Cb1swKZmX7XuVy1RKSgFACo1JpGf59X8in8I3HunAQAl9wfuNKyQXe1NsoJoBGLsG4dfwHg-R4mZwmG_YYfwobVc0N3Ow9IEEoULFDeEVd62yI7XxC2kyXXg-DUX2DtQryiPJFz5lRnW13XpKOcHqJnrpscAj6RGgRq1u4vjr5Rzeg78Y9vQ1-DP1GatjJGQE_xyHH_WZSaVmbcOMxnoEnnzDQ9XpYf_AUZOuLLMBDiiC5GJOppG3qQMWDVuJ4uf0jpvyG3hvGpgJMWp_zDTFjJqr1drkzwvHK2J70t5JSle7lPsGmVpbaohzyiUr4SK4zSsjlvkdLmtHDZzgQuQsR287KRAWkI-Mkuor-RCke3jjHG5iYFL4S_LJ6hlrkgrGYMRMkYRXbxk0ry0m2Al6WBIilpJMeSBXsbtDrQ0QDqKHSA7RwmMXGKZ7W-UqN4Q7YTUiVuagQlH6PKqLhYZ7J4EKRXPbTj9tH_eqUePLBhi7NPrhTuYMCfujzCqgtySBuUJT3FlR-KDxSZ140bf7gCwpfQRvE_7c8zN7CS7DYOyqxRm1QI1WasHVwaIaT6dNajjxqdJuzsvBSn8PrImbT8Kan8f-l3F4fv95lopRcpk-8ediYjvyXUQ6GGlDExuNU1Odh92S1TwjkMCdoOveUv8oCDLTdEs2Kli1ESHmHQ0mNOrGtCweuw2G7OML7KBdumvLssTBYPwmx2MJgYMfBa5qBORzL2ZMqIFq6IPUGXpfJZgImPbVDHvc0x3_A55_bF-23MwCM0SQ_7MktIhEt6P88bprUewqrESMTyZHR0rMLecvpKT80yfN0qioESxDnwQPMKWebF7eCKt-Ey1oCso1s5raqX1wrFTL3TlxeFHJ-c2UesET2_Z89KMOK9bwzgde7Nr_J-hv1sYwEhGchzLMIrrocfCAZljW-V8aJERtmMaOeANVjkBLSh-uDIlEaKOrGZGPKpZviiT4Rlunl41Ecc731kVLvTsZW1ehzApD3DppM7kNVpqERdtNyPf2LYZWKJB29O-pawlE9I1ZXrWt00vgqDLqDGm6AW9-0WWT8JvFwTTB57nlTUVbSFb_ht1AjfHwXXe32Jj_Ilit94YU9tmMf8vFHHqXRyFzl33F1bKoOGIQNynzWqt4Lmh8bIMeArCnpf665gDTR1FL38TomzBEo_JZjtet9j4Jz88yDAb0KnYNhlzDvKNbtZMnxo0JjwniiBa8HT8_PjOpNTlQ8h2X199wSOeX4gBaY2Fv5-ta2kllYkxjm0H87Y8DNo21tgmtj0sBXtyC9YNYb6cQZXqQXroUUm3yQ5JyKX7oK9T5vb9KV8QGyxeaqp-c8fR47bknZ377Lz5dzbqvk6sMAxqSl41StaRtP5CUcdR2kP_a1xDp57bCNL4Xy0tVGwf0Xt35v4mLvNgdjfdTjhhWgjQABr3PyxONBpcCE7PuyIjiBk_MIzg2aXmFPQ3dwMKcjFafzal89Z4UbYKRc5NRbUTiENbBlvAfvaETmycpqdYza4ibRd2Zsn7ab9VrFI8rgHNt5JsEY0I4IoREaG_rLntZACB5I2gHg4ONvn3Lso6cHaVGFwGk0idryFvTNHBnrHp0w5adEuRyW1JoODvDf9q7oJvAx0NmNGQksrYd1vs2XbTatxLluFc0G1Zou3HZkcRhz0FkaSs0O-LFHbuC6Nhm6mQ0Hcevc1iviQIoFAC91mXg4l_CfHKV_3nE8cTdPrXyAaKf1InqUMIW4YMQ_WUG6B_OMRbv08fBwjQ4AYzKNmYz9IwNKblkkROnj_T-qUmuNuRPKWahzAwA13A3nKTDEHvHUP0icc8IVajKBsu33UUYGFLbOiGBFPic-xNEjCZBtEIqqKUR29cwBlSDLGBBMtjee-AA3E18ZJ3UWig4srf4wnYboabKyPNjczhuTKFtxjR3MHoS5BRYzNIlee_N7izi24-mlo5DewqhXK4D46LppZFuNZPWSTJkAz25jltIoHnsynv8JRdbs2URWfOddlh8q_NBMoFfXLWg-U2dncWQM7GXmooIyzHrVlY_I7E2UOSBGTGVjTIR9kaOKKM3Isk6fT7BVWfqSbtq981FXDnxfZAoJPikEg9e1tLq20eHpOYntDKDQAitDVhzWdjzCEDxhY7PXv960QSMlFp5p-4C-l5aF1nUr2bAkgJWA3t7Qp8vIBplbl6Di2gZBmczzFdVOmxMcU_E5OVREwebFJ6m9OxL1ZFelz_84-mCmObcF2IY7jx1xWMzIzhLotcu45qLS_R0i8hxa45olJTmFP41wWz5wzv8_Xe3eWygd6tAkS_Alg317-1z5O8AyCUu2ZlfvkW9JrV6TlZtENImA8eBd03jMn_vxeMgAcp50IJrOGo09iQiVqvNhbGXMD49JhsgO0KzP0RIGWI5BugDhLnhViPz2RkpJkPCNw_XpNPc28EeSE_msTYAelhgKn7sw302f5hw1Y-zUKYlcqyHzvmBOmgWj13mbAYjMrWf6aKW-yX_X8XozAZbNsVcHG1QprJzVyAZouwOHYxK3ri6BuKcKC0AVhPtzwhL3pOZEl2Hq-46psus3H5wFWCCMFmHIaTHvSHuWFzTYQjki1LKk_6Bg-JMSI_vbWXKFftdULmYTdGMJZBG7lQ8PJFFEQnUlg3Km_JmSSeQJ4b8yWYHyu5U-LTHd5q9SBQ0sj7rSwlpooVt4IyXFz3eJ5XrgXrzAv3RlMz-I8ponwMVbmBebWybOwjtQVlKmm42mEv7H3nd8G-ugZVIMMHtD7Bz_Eu7ZNGrBoYpyLy5LnGKk6Nv5jwZ47pPEZl3OxlG3dW7kXn_Qgh1EgyoW7MGGx_ynbkk4Ok8W3mwUlUGM_fjEvLKwWSkNSo6YNJNX-uopNSonFLPkain1Ia-l3DxU-kOqT2oioxK9LJawX8ZZC8Gm4l4cSv-uNPtGiwtOcB6bXiqrcTMbPqYjyy9mXU_L6RBa-2bK61xWVa_Vn1lh7D4hjFAMoMisoXGoFbBE3Zqc0UlPxKjmWYrb-u-8ySaTW2z1DELVz8mQKVbmokxW9tXJ8LzPWnYjv8rVtYe4GiAb7xqBAp0TQeUjSZKkL8N9PtlTW0bKAfbYSeqXknZZUz5IuqoCjv6MS_RELVCL6JE8sKNWGGaHshikl-Pgv3SFrgJYHI9_bmdIF9n6VFmuj53ZWTWY6-z9qsEgBj9NDYLz6oFpq-bJXha7K8ZTg9xfW2V6boyoWrrUMoCaa71b8FojdY53L2DvLQglWCx2Mghac3mPBEUDuSR1eEATKvhPh8464Er7VlXq6BAujBqoT3DCvyX2UWeGkszGGUCh1i_DjSb8TQsBa5VNtTeEB_kO8uy9QGRHh_nrFg6ICW1rotvSWBsoAQxHOVHFJqEIGBHkoCZS1k1Ig3mY7V6uZViJ577wMbu5YXEqjcNnWdpbZFg7JYfTeerjImc2A4fUc4lOAU-b0QBaN3Rn5WJxitY55mEh4Gi-nB1yTWAGvz0g2Yqn1iHjxJdCsTouuQbektZTeLtgKyvK8zmWRjWc_itIarVDHgBVIfwGttSjNfQMXdvAOud1REbUSwTRdVLbSLpcoyAyg9znEDFodM2Y4239cHeeX_OSTQOYGU-bnF7jkdhW2GOqiJc9yFyQBjcRQ2mveBDfYRdn9kf7o2PCpYNPj_XxkzwGXmvjexsNMi1t7us0j_Hdoa5VbRj1u0ZlO1y-HgtADblaLTxrgx4u_3fVYXaU2DC0cPJXWGLNgR4JpdoGCJISVIg77WR7fXBT3uLmxA13kdTbSuBdoQ9407W8-Oo6Yj6OZV42UOEPGe544EVhNbgOvNIPpXTHZMMC55nMcSs0j55k46e_SNewUmpHzBk5088t8-HIH1acWiyJybS2PlkLMBA3lGDDAlqC6hK1RNkkajwq1cjU6-bqBu0ZFzuSIcHQ_qF7yHzEknhr_N1gcnkvI9ac3VYoJqL9A6X69-cwilmj-00Q2ADJHw5Z6FcQ2t3V0HxUQoZXrYutqMEqXMEX_7831-YS4IWhFr6iVxQoKRsgwcQoJT-ww7nGk5IXtlrU_0MGLGNwDiRGFNI8h2ZSkkcdyh0ezKAqGGCdxzVwjmQw-KGtAR9MxQ6DgRO5JUC5FJc5kWGwRt0c9Se734v-OwdG0Ylpbt05r5R2Pbs7bf0swJ-PBrT5WEp9-OjS37lTcy0YgnUy1XMw2rMUM7WlWhA37GP08_M2ny5ryXdGE9VMk51ItBZNfn8LoDPEpj97VdFSG4wvX80IoB1csSHIPGJqmh2oNOytp2W-Z435pBn3Gjzjh-5KLA7n2AjoCEQq1GNIDdBrmetbzup9G90f1RikwWWBqmxiGwgH-7DbFbWVyNn5lZnVrGCEduneqL2h418ONz3USCeGhif18ihM5hl6SRPAa3EGwVeXr00bazKCjIPONgtIzvtPJMPSmNL8C4rv02YJmotZCm6RtBeLlBeeNSF0TPgdWUw5KmXUno_TXh_oTACWKC6D63ijjbsWFxOWh7HOBGh-BI2W4ONV8gV-uKeL8_d4V8Ztk_bTOSRsHr4uPabnuD2fWpJeXlBxIbVlwPRXPgMxEyNdIbWwutubX7EqjpuDCjdGDQ5BAbx_XwX1I5_nyFupWXEjWIgOV0ySbiQd7l0NMM5a8SD-DSRXqFAvx1uHM04ytCJ38qQKI7Jd2NEoRPgPhURiTALh-3SwJqi-FjM7g9K37j0l5PfpLdNstqaqENuV4aa_xik8A-fLvqbIkSLRlmtavJIPDT2YsYin9jS3wRu37O3Gjd_aNsUqxIrLg_pduTHQcZOqosyrNH1XhneXimfWutAtV3b2XX8lEj0k4XvfKxvG3RKYqsBCqn_M0Xh0qApnz9o-tCaFni_l6iXGKzuevck67VggfaGG6DAWya_IBMFObwrK9fmsCViGOw3GH--19oK1w61ExZ",
        "iv": "QldSPLVnFf2-VXcNLza6mbylYwphW57Q",
        "protected": "eyJlbmMiOiJYQzIwUCJ9",
        "recipients": [
          {
            "encrypted_key": "BMJ19zK12YHftJ4sr6Pz1rX1HtYni_L9DZvO1cEZfRWDN2vXeOYlwA",
            "header": {
              "alg": "ECDH-ES+A256KW",
              "apu": "Tx9qG69ZfodhRos-8qfhTPc6ZFnNUcgNDVdHqX1UR3s",
              "apv": "ZGlkOmVsZW06cm9wc3RlbjpFaUQ3ZXlJdU9pekRDY3lvbzBkc25kZmp6ODNOMzUtb29GNWN4cGNqQllBbV9nI1hJR3NkZmlGZTk3MVNWTmFDTEhldnY1aGp3ZkZBQURRbW1ERFZOM2xVX2cjekMxUm51dnc5clZhNkU1VEtGNHVRVlJ1UXVhQ3WZ0I4MVVtMnUxN0Z1N1VLNA",
              "epk": {
                "crv": "X25519",
                "kty": "OKP",
                "x": "Tx9qG69ZfodhRos-8qfhTPc6ZFnNUcgNDVdHqX1UR3s"
              },
              "kid": "did:elem:ropsten:EiD7eyIuOizDCcyoo0dsndfjz83N35-ooF5cxpcjBYAm_g#XIGsdfiFe971SVNaCLHevv5hjwfFAADQmmDDVN3lU_g#zC1Rnuvw9rVa6E5TKF4uQVRuQuaCpVgB81Um2u17Fu7UK"
            }
          }
        ],
        "tag": "xbfwwDkzOAJfSVem0jr1bA"
      },
      "sequence": 0
    }
  }
}

Things to note...

OR13 commented 4 years ago

Here is another comment about layer concerns: https://github.com/decentralized-identity/secure-data-store/issues/41#issuecomment-635511903

I see replication as a layer 1 data structure concern, that can be described using encrypted indexes, and no additional features are required to support replication

msporny commented 4 years ago

There should be requirements to enable replication at Layer 1. There should NOT be a need to implement replication at Layer 1. Replication (strategies) should go at a higher layer.

So, what this means, the Layer 1 data model needs to be able to support the act of replication, and replication strategies, at a higher layer.

In other words:

Layer 1 data model MUST support replication strategies at higher layers. Layer 1 features MUST NOT include replication strategies.

csuwildcat commented 4 years ago

@msporny we simply must have at least one primary, recommended Strategy for replication integrated into the spec, lest it will be a sea of SDS datastores that continue not to interop, and a bunch of providers with highly disparate offerings that create unintentional service silos.

csuwildcat commented 4 years ago

My belief about this, and a few other Strategy-based components that could be extensible with optionality, is that for each we should have at least one primary that the spec defines, with the ability to extend. If we don't do this, we're not going to end up with broad support for interoperable, masterless datastores that can reliably be used as the basis for universal decentralized apps.

OR13 commented 4 years ago

@dlongley are there concerns about replication of very large documents / streams, where the data model would look different?

I can imagine replicating a 2 TB encrypted data set efficiently might need more primitives than simple encrypted meta data.

I would imagine that from a software perspective, we might want something that felt a bit more like bit torrent for that.

csuwildcat commented 4 years ago

Here's a pseudo-representation of what I believe the minimum viable base message should include and define to satisfy the requirements:

newObjectID/existingObjectModificationID = Hash({
  encryption_metadata: ... ,
  permissioning_capabilities: ... ,
  replication_values: ... ,
  actual_payload: ... ,
  signature: ...
});
OR13 commented 4 years ago

hmm, a tension I see is replication of the raw storage level vs replication of logical storage.

For example, you can use IPFS for raw storage, and get replication of the logical storage layer as a function of replicating the raw storage layer....

consider a request to replicate 3 documents from server A to server B.

if those document can be mapped to a set of CID any system operating on CIDs could hand them over...

In todays edv model, we can model this by imagining the EDV interface for documents used CIDs for document identifiers... a client downloads 3 documents and then uploads 3 documents (this is how it works today)... the fact that the documents have CIDs is pretty much irrelevant... but the client could also create a capability for server B to pull those document over directly from server A....

This is similar to how CouchDB handles replication: https://docs.couchdb.org/en/stable/replication/intro.html

At scale... asking the client to download and re-upload will cause us problems.... we will need direct replication... with something like what couchdb has...

The question is how to get direct replication, and what layer does it go at.... if we modeled our replication after couchdb:

the client would upload a replication document to server B, and it would contain access capabilities for server A, the document would spawn a job, where server B kept asking server A for documents / content, until the job completed or the capability authorization expired.

Not that this can be done without content identifiers... or replication at the content layer... in fact, thats exactly how couchdb works.

dlongley commented 4 years ago

@OR13,

are there concerns about replication of very large documents / streams, where the data model would look different?

In the current implementation we've done, every document may optionally be associated with a stream of chunks (arbitrarily many chunks). Any data that doesn't make sense to store as a document or that is too large (16MB+) should be chunked and stored as a stream, where the document associated with it can describe the stream however the user desires. We need to update the spec to reflect this.

I would think that replication would therefore always replicate both the document and any associated chunks of its stream -- and the number of chunks (and therefore the total file size) shouldn't matter. The client can choose any chunk size they want up to some maximum that we should say every server must support to be compliant with the spec. The chunk size and number of chunks is stored (encrypted) along with the document that is associated with the stream -- so it is up to clients to manage that. The server, however, is aware that a given chunk is associated with a particular document and sequence number (or vector clock should we go that way) to aid in replication.

OR13 commented 4 years ago

We can define direct replication using todays edvs as:

a delegation of access to documents returned from an encrypted index query.

In other words, if the EDV has a DID, you grant the edv access to documents in another edv... you use the encrypted indexes to query for the documents, and receiving edv downloads them once... The edvs never have access to the plain text, they only leverage their signing keys to authorize the transfer of documents which they cannot decrypt.

For those who want to have plaintext documents served publically from the same server... you simple authorize the edv DID to disclose the document contents to anyone, and encrypt content whatever content you want to be public for the edv...

Everything I just said is currently layer 1 & 2... the only new bit is the concept of an edv server being an edv-client as well, which I am sure will make the client-server bikeshedders happier.

So regarding the original title of the issue, and assuming my proposal is sound... replication would stay a layer 2 concern, because it is assumed to make use of indexes / search.... the reason is that replication must be built on top of authorization and logical storage... so it must be at a higher level than them... and since logical storage is built on raw storage... I assert that the existing layer architecture is sufficient:

Layer A - Raw Storage (CID / Binary) Layer B - Logical Storage (Vault / Document / Chunks) Layer C - Authorization for Logical Storage (ZCaps / OAuth) Layer D - Search / Indexes (Give me all documents related to wallet 123) Layer E - Replication (grant server B (client) ability to download encrypted documents from server A)

OR13 commented 4 years ago

Why must replication be built on top of logical storage?

Because as the data controller, I don't want to sync everything to my phone... I want to sync the things I need on my phone....

I also don't want the server operator to know what I am synching to my phone... so the query must be opaque to the server.

csuwildcat commented 4 years ago

Layer A must include secured replication directives, else the state can be corrupted. Also, objects really need to be inferentially ID'd by hash, else nefarious inbound callers and nodes of the masterless mesh can cause collisions and confusion: imagine you use subjective IDs, and Actor A creates an object with an ID of 123, and the object hashes to 456, while Actor B creates another object with the ID of 123 and the object hashes to 789. The two objects are being declared subjectively to be the same one, and nodes MUST resolve this conflict. Why on earth would you want to deal with this nonsense when we can simply, securely lock in object-atomic uniqueness, security, and empirical deduplication without relying on subjective value judgements? This is what I mean when I say replication MUST be arguably the lowest level driver of most decisions we make.

dlongley commented 4 years ago

@OR13,

each document being encrypted using keyAgreeement vs each document encrypted with a symmetric key which is encrypted with keyagreement.

I may be misreading your comment, but I think in your example the latter is happening (not the former) -- it just may be difficult to tell from the way it is densely expressed via JWE.

OR13 commented 4 years ago

@dlongley the example data I provided is the current strategy (what edv-client is doing)... which is different than what i shall call the 1password strategy which encrypts a single symmetric key using key agreement, and uses that symmetric key to protect documents... that is a proposed option by @tplooker ... but its not been described anywhere concretely AFAIK.

OR13 commented 4 years ago

@csuwildcat I don't find your assertion that replication at the CID / Binary Layer is 100% necessary for the end user to get what they want.

It's certainly possible to optimize if you have replication capabilities at that layer... but it does not help the user get their wifi credentials from their cell phone to their desktop.

I don't see us needing anything new to handle the case of replicating logical storage objects, aside from allowing an edv-server to act as a client... but perhaps the lack of a reference implementation and clear demonstration of this capability, as it exists today, is causing confusion here.... if you want replication at the raw storage layer, simply use IPFS as your backend... thats not going to help you sync wifi-credentials, its going to help you sync 100% of the content... which is not always what the user is trying to do...

dlongley commented 4 years ago

@OR13,

the example data I provided is the current strategy (what edv-client is doing)... which is different than what i shall call the 1password strategy which encrypts a single symmetric key using key agreement, and uses that symmetric key to protect documents... that is a proposed option by @tplooker ... but its not been described anywhere concretely AFAIK.

Ok. I think the edv-client is only different from what you've described there as the "1password strategy" in that edv-client generates a new symmetric key each time a document is encrypted (vs. a single symmetric key across all documents). The symmetric key is still encrypted using a key encryption key derived from a pair of key agreement keys (one ephemeral, one static).

dlongley commented 4 years ago

I think we need to distinguish between replicating within storage provider trust boundaries (internal to the provider) and across storage provider trust boundaries (needs to be agent controlled in some way). I believe this to be what @OR13 is hinting as well.

OR13 commented 4 years ago

yes, 1 password strategy lets you share a set of documents all encrypted with the same symmetric key, by just encrypting that symmetric key for the new recipient.... I think there is some performance gain for that strategy, but it comes with increased complexity... however the encryption difference between them is not relevant to the replication discussion.

replicating raw binary inside a trust boundary is a solved problem... if the trust boundary is IPFS, use IPFS, if its a linux network cluster, use rsync... if you are using couchdb.... use couchdb replicate :)

dlongley commented 4 years ago

replicating raw binary inside a trust boundary is a solved problem... if the trust boundary is IPFS, use IPFS, if its a linux network cluster, use rsync... if you are using couchdb.... use couchdb replicate :)

Yes -- I agree, and I want to make sure that's not what we're talking about when we talk about "replication" here.

csuwildcat commented 4 years ago

@OR13 IPFS is not a secure replication system - IPFS has no methods or opinion on relating what objects were parents, child modifications, or in what order the might be applied <-- if the directives for this are not incorporated that the lowest levels, the replication system is simply not secure.

OR13 commented 4 years ago

instead of "replication should be layer 1/0/A", we should be asking what is the user story that reflects replication, and what does the spec / reference implementation / proposal support that user story.

I think there are 2 key user stories for replication.

  1. As a data controller, I want to move all my data from vendor 1 to vendor 2 (full data replication).
  2. As a data controller, I want to copy my documents matching some query from my cloud vault to my phone vault (limited subset / smart sync).

Both of these are supported on edvs today, but they require some client to do the migration... this means that user story 1 might choke on really large data sets.

dlongley commented 4 years ago

@OR13,

I think there are 2 key user stories for replication.

  1. As a data controller, I want to move all my data from vendor 1 to vendor 2 (full data replication).
  2. As a data controller, I want to copy my documents matching some query from my cloud vault to my phone vault (limited subset / smart sync).

+1

Both of these are supported on edvs today, but they require some client to do the migration... this means that user story 1 might choke on really large data sets.

"Some client" could of course run on a service that you trust to do the replication for you.

csuwildcat commented 4 years ago

My replication user story/requirements:

  1. As a user, I must be able to have multiple instances of my datastore that sync without requiring a trusted master, so I can have maximum, seamless portability of my data.
  2. As a user, I never want to deal with conflicts and strange states that could appear in apps, so I need the spec to ensure that all objects created in my datastore are atomically, deterministically unique, so that two instances of my datastore or outside writers cannot, by error or intention, create two objects that claim to be the same ID.
  3. Because the ability to manipulate object convergence is a blatant security vector that can corrupt, destroy, or manipulate my data in malicious ways, I want the technology I use to ensure that all instances follow a deterministic set of rules, based on values bound inextricably to objects that can't be mutated without invalidating those rules, so that I can replay and know my object history is being correctly assembled

^ Folks, if we don't do this, it's fundamentally insecure, regardless of whether an object is encrypted or not.

csuwildcat commented 4 years ago

Perhaps I should be clear about my highest level intention for users: As a user, I want the SDS spec to standardize something a bit like Firebase, but more self-sovereign and trust-minimized, that the majority of all apps ever created simply request access to a semantically correct area of objects in my datastore and store data there. I want to see decentralized, truly serverless apps. Hopefully that's what's being built here, not some happens-to-be-DID-based version of Dropbox.

OR13 commented 4 years ago

@csuwildcat https://github.com/orbitdb/ipfs-log

Replication of raw binary content in a trust boundary is a solved problem... if you want to build your solution on top of orbit db, you can totally do that... its serverless, content addressed, and supports documents, subscriptions, etc... if you do app level encryption with dids... i think its exactly what you are asking for, but the problem with CRDTs on encrypted content... is that the clients with the keys are always responsible for handling the merging... so while the encrypted content representations get merged transparently (as they do with edvs and orbitdb today), when you go to apply the decrypted event, your app might have trouble.... lets look at a concrete example:

Family Grocery List

Alice, Bob (Read + Write), Charlie (Read)

We create a vault for the list, granting Alice and Bob write access to everything in the vault, and charlie read access.

The list starts empty, Alice create the first entry...

"Milkk" => encrypted

Bob adds the second document => "Cheese", => Encrypted

Bob notices Alice spelling mistake, and corrects it.

etc...

How does the DID EDV Family Grocery List app work?

Each client can validate the document with the schema before attempting to CRUD it.

EDV Server handles collision based on sequence number.

Does the server need to know anything about what a grocery list is?

No. In fact, the server has no idea that it is managing a grocery list.

Can the server sync the grocery list in realtime to all parties?

Yes, subscribe for document updates in the vault... decrypt them and render in app.

Can this be modeled other ways?

Sure, but each document that is encrypted must be used to mitigate update conflicts... for example... A naive approach might have been to create a single document which represented the grocery list, and try and update an array property of that object.... this would not work... 2 edits to the list at the same time would not work... I think this is the layer you are concerned with daniel... you want to ensure that plaintext objects, like you see in a data store like firebase, gun or orbit db, can be updated / edited without collision... since the collision space that you are worried about is plaintext... I would consider this to be at a layer above encrypted content... since you have the same issues with orbitdb / firebase / gun... they can only give you a nice JSON CRDT, because they can see everything....

You can build that on top of an edv, or you can build that on top of a SQL data base... it exists at a different layer.

Now if we want to ensure that we support that layer, we need to have language that maps all the way through to it.

Use https://www.npmjs.com/package/flat convert each vault document into a JSON Path and a JSON Value.

download each document, decrypt them, and unflatten them, and you get a hierarchical object.

You might also just make each of the documents an ietf-json-patch... with some unique ordering property... a client downloads each of the patches, applies them in order, and you get a document store... this is how orbit db works... its built on an append only log of patch operations... its also how sidetree works.

when the patches are encrypted, the server operator does not know if they are storing sidetree operations, did peer deltas, or grocery lists... and indeed the server, allows the client to decide which merging strategy for content they want to use.... now if we want to make a recommendation for how to store json patches as documents in a vault, and how to reassembly them... we can totally do that... but not if we can't agree that the documents are encrypted, and the server operator cannot read them, or meta data about them.

I'm not sure how did:peer handles its patch operations, but potentially EDVs could be used to store the deltas... since its just a data store.

cc @swcurran any thoughts on this?

OR13 commented 4 years ago

@csuwildcat to comment on the approach suggested.

OR13 commented 4 years ago

Closing until @csuwildcat can raise a specific use case which is not supported.