Convert version entry to be a JSON object instead of an array, and related changes

swcurran commented 2 months ago

At the 20240912 did:tdw work item meeting (notes and recording), it was decided that a PR would be created (coming soon!) that will make the following changes to the spec. Those interested should review this list and weigh in if you have comments as I begin to create the PR. I think this is a good summary:

Remove the use of JSON Patch from the spec.
Change the JSON array [] for each version entry to be an object {}.
Make each object in the version entry a named item. I propose:
- { "vid": <versionId>, "vt": <versionTime>, "p": <parameters>, "dd": <DIDDoc> }
The data integrity proof is then applied on the version entry (the object above), and per the DI spec, added to the object. The challenge is no longer needed.
Continue to use JSON Lines for storing the version entry object into the log file (remove extra whitespace, add <CR> at the end.
Update the calculations as needed to reflect the changed structure -- but the logic remains the same.
Update the naming, definitions and references (defs and refs) in the spec to reflect the changes.

I think the spec. changes (and presumably, the implementations) are pretty straight forward, but the PR will be pretty large. It very much feels like a simplification, so should be a fun one!

Feedback, welcome!

brianorwhatever commented 1 month ago

I haven't really heard a good argument for moving from an array to an object. If the only gain is having the data integrity proof embedded into the document I don't think this is worth the drawbacks:

need to create a context to map these things ie vid, vt, p, dd (something we've been attempting to avoid)
larger data size
easier to add new things in the future

One of the things I liked about using the array was having the 5 properties "set in stone" so any future changes to the spec won't update these things and instead can only change the parameters object.

I am not against dropping JSON Patch support but would like to see the actual data size comparisons of patch vs value.. Maybe we could do that before dropping it

andrewwhitehead commented 1 month ago

I don't think that we need to define a context, unless we start using JSON-LD proofs instead of JCS ones.

For consistency it might be nice to choose only 1, 2, or 3-character keys, although "proof" does break that. :) I would also prefer if dd was just the 'state' (or s or st) for genericity.

The semantics of the params might not be appropriate for all additions in future, given that those values always carry forward. It is nice having those extensions all in one place, though. And I do think that a resolver should fail if it finds keys that aren't defined for the current version.

I think a nice benefit of using the object syntax comes in when endorsers/witnesses get involved, as you can send the latest line to be signed (potentially even with a context attached) and it's a little more self-describing:

{
    "v": "19-hashhashhashhash",
    "t": "2026-12-26T11:59:00",
    "p": {},
    "s": {
        "id": "did:tdw:..."
    }
}

brianorwhatever commented 1 month ago

I don't see why it being self describing would help a witness. They should know what you are asking of them and if they are compliant with the spec they will know where the data values are.. and these short keys aren't even self describing. They either need to know what "v" is or they need to know the version is the first item in the list 🤷‍♂️

I'm still only seeing downside to this change..

swcurran commented 1 month ago

Caveat — non-developer $0.02CDN:

I think self-describing helps everyone — especially with the minimal cost (just a few characters per version — 16 or less by my count). I go back to the merging of the entry hash and version number that resulted in one item in the array. Such a change would be impossible once the spec becomes version 1 — or would at least require sniffing the data. Agree that had a JSON-LD context been needed, that would have been difficult, but agree that we don’t need that.

I think that developers will be more comfortable with this. I think every new Dev will always ask — "why are we using an array?”.

brianorwhatever commented 1 month ago

Such a change would be impossible once the spec becomes version 1

this is a good thing 😄

self-describing

I don't believe "v", "t", "p" and "s" are self-describing. If we go this route and want "self-describing" I would like to see

{
    "version": "19-hashhashhashhash",
    "timestamp": "2026-12-26T11:59:00",
    "parameters": {},
    "state": {
        "id": "did:tdw:..."
    }
}

Even though it adds a ton of overhead

swcurran commented 1 month ago

Only 28 bytes more per version. Negligible in processing, and would likely compress nicely in transit. :-).

I do agree that I hate the one letter variable names. A benefit of what we have so far is the number of items is not likely going to get a lot longer, so with short names, there isn’t a lot of mental gymnastics to remember which they are.

PatStLouis commented 1 month ago

+1 for Brian comment, or at least something in the middle:

{
    "ver": "19-hashhashhashhash",
    "ts": "2026-12-26T11:59:00Z",
    "params": {},
    "state": {
        "id": "did:tdw:..."
    }
}

Question: if the whole object can be signed now instead of only the did doc, what is the purpose of the hash? The hash was used as a value to be included in the signature as a proof challenge so you could 'sign' components outside of the did doc but this is no longer required.

PatStLouis commented 1 month ago

I like the idea of having an object for the simplicity that I can just use an existing data integrity implementation instead of having to create additional steps and separate the proof from the document it signs. Data integrity is designed to be embeded and this pattern feels odd, might as well be using a jws in this scenario.

For staying with an array, I think using jws instead of a data integrity proof makes more sense as its lighter and you can just place the signature string in the corresponding index.

If moving towards an object, data integrity feels more appropriate and implementers can use their existing data integrity software without additional proof manipulation implied by this specification.

swcurran commented 1 month ago

Question: if the whole object can be signed now instead of only the did doc, what is the purpose of the hash? The hash was used as a value to be included in the signature as a proof challenge so you could 'sign' components outside of the did doc but this is no longer required.

The purpose of the hash is to chain each version to the previous. Recall that the entryHash is calculated over the (currently) array, where the first item is the versionId from the previous version (and the SCID in the first version). That makes the hash a chain of versions — like blocks. Make sense?

PatStLouis commented 1 month ago

I see it yeah. Using the challenge for this is interesting, it could also be interesting to have a field called previousVersionId instead of challenge, used in the proof or parameters. The definition of a challenge reads as:

The value is used once for a particular domain and window of time. This value is used to mitigate replay attacks.

Which isn't entirely the purpose here. Having a previousVersionId with a definition of:

The version of the previous did log entry. Verifiers SHOULD verify the associated proof and validate the integrity of that previous log entry before processing the current version of the did document.

I see the use case for the versionId.

andrewwhitehead commented 1 month ago

I don't believe it would be necessary to set the challenge if we switch to using the line as the payload for DI proofs.

SylvainMartel commented 1 month ago

I'm no kingpin, but changing to object looks to me like it will make it easier to work with programmatically. Dropping json-patch does simplify implementation(I almost regret not waiting 2 more weeks before coding :D ) , and while it may require a bit more storage, it saves on CPU cycles :)

brianorwhatever commented 1 month ago

As it looks like I'll be out voted on this issue I would like to again link to this tweet that tries to explain my perspective https://x.com/_felipe/status/1810695731570937961?s=46

Since we are likely going down the JSON route.. I would like to ensure we do have a JSON-LD context and potentially we could remove the versionId param with an id so that the graph structure is intact. Something like did:tdw:{scid}:example.com#1-hashhashhash

swcurran commented 1 month ago

While I agree that for “lat” and “lon” you don’t need self-describing, I do think that there is more complexity in the DID Log entries. It’s a chicken and egg issue — it will eventually lock in such that it need not be self-describing, but until that happens…

I’m very nervous about an LD context because…. :-) I’m going to take a shot at the spec PR without for now. I was thinking about id for versionId, but didn’t propose it because of … JSON-LD. I didn’t want to use that overloaded term.

Fun stuff. Looking forward to creating the spec update. It feels right...

swcurran commented 1 month ago

Here are the names in the PR and that we now agree to use.

{
    "versionId": "19-hashhashhashhash",
    "versionTime": "2026-12-26T11:59:00",
    "parameters": {},
    "state": {
        "id": "did:tdw:..."
    },
   "proof": <di-proof> // Align with the DI Spec.
}

brianorwhatever commented 1 month ago

This was completed in #108

decentralized-identity / trustdidweb

Convert version entry to be a JSON object instead of an array, and related changes #106