Open OR13 opened 4 years ago
Should we extend this to include any IPLD Content Identifier as well, or is that out of scope?
@EvanTedesco great question... as it stands today, I think that IPNS might be a better identifier to map to Documents, since Documents are meant to be mutable, and since they are encrypted... there is very little value in using raw CID... because that will change the second you update one or its indexes.
One obvious concern with IPNS, is that people might choose to use DNS Link... and thereby leak meta data... https://docs.ipfs.io/concepts/dnslink/#publish-using-a-subdomain
Which is one of the main reasons that the identifiers for documents are so strict today... the idea is that allows a client to pick a scheme other than something that looks random will invite meta data correlation attacks etc... once the ciphertext is blown... that identifier might be used to round up everyone who was tricked into holding that object...
https://law.stackexchange.com/questions/16136/legality-of-data-chunking-concerning-child-pornography
I think this is a really helpful concept to keep in mind regarding encrypted chunks...
However, we might also consider using CID / IPFS / IPNS identifiers inside Messages (Hub concept)... which appear to be at a different layer from the regular Vault, Document and Index data models described by EDVs...
Sorry I misread the issue and was thinking the question was asking about multihash support in general as opposed to as the Document ID.
I like the idea of multihash being supported wherever possible, but I am not intimately familiar with the associated costs in this context so I will go back to lurking on this one :)
I propose we also consider using Hashlinks (which could, if needed, be combined with IPNS links) for Document IDs. (And chunk IDs as well, actually).
Encrypted documents in EDVs are mutable. I don't understand the proposal to make Document IDs hashlinks/content addressable. If you give someone a URL to an EDV Document (or a zcap that references the URL so they can read it/update it), the expectation is that the content it references may change. This doesn't mean that such a URL could not be augmented with a hash to express what was at an endpoint at some point in time. Perhaps that could be useful for a number of use cases -- though a hash of the unencrypted document contents may have greater general utility.
However, using a hash/content address for the document ID itself -- I'm struggling to figure out how that wouldn't run afoul of mutability/sharing requirements or introduce (needless?) complexity with different "classes" of document IDs. It's not clear to me what we'd be trying to be achieve by doing so.
I take back my previous comment:
I propose we also consider using Hashlinks (which could, if needed, be combined with IPNS links) for Document IDs. (And chunk IDs as well, actually).
^ I meant Hashlinks for the overall Document URL. Not for the 'documentId' part of the url; that wouldn't make sense.
Discussed on the WG call 24th of June, question was asked about whether this issue is really asking whether document id's should be base-58 or multibase encoded
Suggestion was to use multi-base no objections recorded on the call
yields:
Document ID "QmRAQB6YaCyidP37UdDnjFY5vQuiBrcqdyoW1CuDgwxkD4" must be a multibase, base58-encoded array of 16 random bytes.
Today... its not possible to use multihashes for Document IDs