decentralized-identity / edv-spec

Encrypted Data Vault Spec
https://identity.foundation/edv-spec
Apache License 2.0
13 stars 5 forks source link

Should multihash Document IDs be allowed? #53

Open OR13 opened 4 years ago

OR13 commented 4 years ago
 const testId = 'QmRAQB6YaCyidP37UdDnjFY5vQuiBrcqdyoW1CuDgwxkD4';
    const doc = { id: testId, content: { someKey: 'someValue' } };
    const inserted = await client.update({
      keyResolver: mock.keyResolver,
      invocationSigner: mock.invocationSigner,
      doc,
    });

yields:

Document ID "QmRAQB6YaCyidP37UdDnjFY5vQuiBrcqdyoW1CuDgwxkD4" must be a multibase, base58-encoded array of 16 random bytes.

Today... its not possible to use multihashes for Document IDs

EvanTedesco commented 4 years ago

Should we extend this to include any IPLD Content Identifier as well, or is that out of scope?

OR13 commented 4 years ago

@EvanTedesco great question... as it stands today, I think that IPNS might be a better identifier to map to Documents, since Documents are meant to be mutable, and since they are encrypted... there is very little value in using raw CID... because that will change the second you update one or its indexes.

One obvious concern with IPNS, is that people might choose to use DNS Link... and thereby leak meta data... https://docs.ipfs.io/concepts/dnslink/#publish-using-a-subdomain

Which is one of the main reasons that the identifiers for documents are so strict today... the idea is that allows a client to pick a scheme other than something that looks random will invite meta data correlation attacks etc... once the ciphertext is blown... that identifier might be used to round up everyone who was tricked into holding that object...

https://law.stackexchange.com/questions/16136/legality-of-data-chunking-concerning-child-pornography

I think this is a really helpful concept to keep in mind regarding encrypted chunks...

OR13 commented 4 years ago

However, we might also consider using CID / IPFS / IPNS identifiers inside Messages (Hub concept)... which appear to be at a different layer from the regular Vault, Document and Index data models described by EDVs...

EvanTedesco commented 4 years ago

Sorry I misread the issue and was thinking the question was asking about multihash support in general as opposed to as the Document ID.

I like the idea of multihash being supported wherever possible, but I am not intimately familiar with the associated costs in this context so I will go back to lurking on this one :)

dmitrizagidulin commented 4 years ago

I propose we also consider using Hashlinks (which could, if needed, be combined with IPNS links) for Document IDs. (And chunk IDs as well, actually).

dlongley commented 4 years ago

Encrypted documents in EDVs are mutable. I don't understand the proposal to make Document IDs hashlinks/content addressable. If you give someone a URL to an EDV Document (or a zcap that references the URL so they can read it/update it), the expectation is that the content it references may change. This doesn't mean that such a URL could not be augmented with a hash to express what was at an endpoint at some point in time. Perhaps that could be useful for a number of use cases -- though a hash of the unencrypted document contents may have greater general utility.

However, using a hash/content address for the document ID itself -- I'm struggling to figure out how that wouldn't run afoul of mutability/sharing requirements or introduce (needless?) complexity with different "classes" of document IDs. It's not clear to me what we'd be trying to be achieve by doing so.

dmitrizagidulin commented 4 years ago

I take back my previous comment:

I propose we also consider using Hashlinks (which could, if needed, be combined with IPNS links) for Document IDs. (And chunk IDs as well, actually).

^ I meant Hashlinks for the overall Document URL. Not for the 'documentId' part of the url; that wouldn't make sense.

tplooker commented 3 years ago

Discussed on the WG call 24th of June, question was asked about whether this issue is really asking whether document id's should be base-58 or multibase encoded

tplooker commented 3 years ago

Suggestion was to use multi-base no objections recorded on the call