How to generate the SCID?

swcurran commented 4 months ago

~~The SCID is the hash of specific element(s) of the initial DIDDoc (such as public keys) so it is verifiable when one has the initial DIDDoc.~~
- Looking at the example DIDDocs on the universal resolver, I realize that the DIDDocs do not have a consistent/concise way of representing keys. E.g. a JWK is quite a different representation from a multibase key.
- We could state that it is the JSON items that contain public keys, but that starts to get ugly/hacky IMHO.
Generate a selfhash over the entire initial DIDDoc, with placeholders for wherever the SCID is to be placed.
- Construct the entire DIDDoc with placeholders wherever they are needed, and pass it into a method to generate the SCID and fill in the placeholders.
- To verify the SCID, a verification method is given the SCID and the initial DIDDoc, does a text replacement of the SCID with the placeholder, removes any items not included in the hashing, and verifies that the hash of the canonicalized result is the SCID.
  - Challenges are defining the placeholder, the hash algorithm, the canonicalization, and the items to leave out of the hashing. Presumably, self-hash defines most of that.
Other alternatives?

brianorwhatever commented 4 months ago

I will have to look at how this is accomplished with JCS but for the RDF things I've implemented the general pattern is:

verifyData = sha256(proofOptions canonicalized) + sha256(document canonicalized)

It might be nice to stick to this approach where verifyData is what becomes the SCID.

I think I'm also ok with the selfhash approach just would need to grok it a bit better

swcurran commented 4 months ago

I’d say we go with the scheme that has the most mature libraries. I suspect that is RDF.

@brianorwhatever, please do look at selfhash to understand what it does and how to use it. It would be nice to not create something new. :-)

swcurran commented 3 months ago

Resolved. We're going to do the following:

DID Controller constructs the initial DIDDoc with {{SCID}} as a placeholder in for the SCID in the document.
Calculate the SCID as: scid = sha256(JCS(DIDDoc with Placeholders)) -- JSON Canonicalization Scheme
Update the DIDDoc by replacing {{SCID}} with the calculated scid.
To verify -- reverse the replacement, putting in {{SCID}} in place of the scid, calculating hash and verify it matches the scid in the document.

@brianorwhatever -- please confirm we are taking the full sha256 result as the scid (or just a part of it?) and that we are indeed using sha256.

brianorwhatever commented 3 months ago

I am using the last 24 characters from multibase(multihash(digest, "sha2-256"), "base58btc") of the genesis document as described above.

swcurran commented 3 months ago

So, the full set of hashing algorithms we are using throughout is:

digest = sha256(jcs(JSON))
hash = multibase(multihash(digest, "sha2-256"), "base58btc”)
scid = right(hash, 24)

brianorwhatever commented 3 months ago

The first and second are only 1. The sha256 digest of jcs(json) is multihash encoded so it describes the hash algo internally. I would also be happy to dictate sha256 and remove that layer.

I think @andrewwhitehead might be doing something slightly different for scid as well

andrewwhitehead commented 3 months ago

I'm using lowercase base32 for the SCID, as case-sensitive base58 doesn't work quite as well in URLs. I'm also taking the first 24 characters instead of the last.

brianorwhatever commented 3 months ago

Can we use base32 instead of base58btc throughout then? I prefer the last 24 for two reasons. It includes more bytes of actual hash digest vs encoding information. It also makes scids more unique as it doesn't include the encoding information at the beginning.

andrewwhitehead commented 3 months ago

For the purposes of the SCID there's no prefix added to the hash, so it's just as random.

I think that we should add a hash parameter, generally sha2-256 (we could have a default if it's not provided). This would allow upgrading to another hash like blake3 or shake-256, which is something Git had to figure out how to do years after it was released.

If the hash is explicit and we use (lowercase) base32 everywhere then we can just use the unprefixed form for hashes and it's easy to check the SCID derivation against the first previous hash.

brianorwhatever commented 3 months ago

I think that we should add a hash parameter, generally sha2-256 (we could have a default if it's not provided). This would allow upgrading to another hash like blake3 or shake-256, which is something Git had to figure out how to do years after it was released.

hmm sounds like we just want multihash in that case 🤔.. but then we would need to include the first half in the scid so that we can determine what hash alg was used.. which reopens my similar looking scid for everyone concern..

If the hash is explicit and we use (lowercase) base32 everywhere then we can just use the unprefixed form for hashes and it's easy to check the SCID derivation against the first previous hash.

I like specifying the base encoding as this isn't something that would ever need to change

One idea would be to include the genesis doc's previous hash in the first lines params instead of the scid and that way we could derive the scid on the fly for replacement use while still having the full hash to decode the multihash from.. since I would push for the last 24 characters 😄

swcurran commented 3 months ago

So, the full set of hashing algorithms we are using throughout is:

digest = hashFunction(jcs(JSON)) # hashFunction set in parameters
hash = base32lower(digest) # No padding
scid = left(hash, 24)

bcgov / trustdidweb

How to generate the SCID? #6