bcgov / trustdidweb

Trust DID Web (did:tdw)
https://bcgov.github.io/trustdidweb/
Apache License 2.0
12 stars 6 forks source link

How to generate the SCID? #6

Closed swcurran closed 3 months ago

swcurran commented 4 months ago
brianorwhatever commented 4 months ago

I will have to look at how this is accomplished with JCS but for the RDF things I've implemented the general pattern is:

verifyData = sha256(proofOptions canonicalized) + sha256(document canonicalized)

It might be nice to stick to this approach where verifyData is what becomes the SCID.

I think I'm also ok with the selfhash approach just would need to grok it a bit better

swcurran commented 4 months ago

I’d say we go with the scheme that has the most mature libraries. I suspect that is RDF.

@brianorwhatever, please do look at selfhash to understand what it does and how to use it. It would be nice to not create something new. :-)

swcurran commented 3 months ago

Resolved. We're going to do the following:

@brianorwhatever -- please confirm we are taking the full sha256 result as the scid (or just a part of it?) and that we are indeed using sha256.

brianorwhatever commented 3 months ago

I am using the last 24 characters from multibase(multihash(digest, "sha2-256"), "base58btc") of the genesis document as described above.

swcurran commented 3 months ago

So, the full set of hashing algorithms we are using throughout is:

brianorwhatever commented 3 months ago

The first and second are only 1. The sha256 digest of jcs(json) is multihash encoded so it describes the hash algo internally. I would also be happy to dictate sha256 and remove that layer.

I think @andrewwhitehead might be doing something slightly different for scid as well

andrewwhitehead commented 3 months ago

I'm using lowercase base32 for the SCID, as case-sensitive base58 doesn't work quite as well in URLs. I'm also taking the first 24 characters instead of the last.

brianorwhatever commented 3 months ago

Can we use base32 instead of base58btc throughout then? I prefer the last 24 for two reasons. It includes more bytes of actual hash digest vs encoding information. It also makes scids more unique as it doesn't include the encoding information at the beginning.

andrewwhitehead commented 3 months ago

For the purposes of the SCID there's no prefix added to the hash, so it's just as random.

I think that we should add a hash parameter, generally sha2-256 (we could have a default if it's not provided). This would allow upgrading to another hash like blake3 or shake-256, which is something Git had to figure out how to do years after it was released.

If the hash is explicit and we use (lowercase) base32 everywhere then we can just use the unprefixed form for hashes and it's easy to check the SCID derivation against the first previous hash.

brianorwhatever commented 3 months ago

I think that we should add a hash parameter, generally sha2-256 (we could have a default if it's not provided). This would allow upgrading to another hash like blake3 or shake-256, which is something Git had to figure out how to do years after it was released.

hmm sounds like we just want multihash in that case 🤔.. but then we would need to include the first half in the scid so that we can determine what hash alg was used.. which reopens my similar looking scid for everyone concern..

If the hash is explicit and we use (lowercase) base32 everywhere then we can just use the unprefixed form for hashes and it's easy to check the SCID derivation against the first previous hash.

I like specifying the base encoding as this isn't something that would ever need to change

One idea would be to include the genesis doc's previous hash in the first lines params instead of the scid and that way we could derive the scid on the fly for replacement use while still having the full hash to decode the multihash from.. since I would push for the last 24 characters 😄

swcurran commented 3 months ago

So, the full set of hashing algorithms we are using throughout is: