Closed csuwildcat closed 3 years ago
@thehenrytsai @Therecanbeonlyone1969 any idea how this growth rate stacks up to bitcoin/ethereum growth rate? obiviously those ledgers do stuff other than DIDs as well, but would be interesting to put "requirements" in the context of other real world production systems.
should we consider eliminating the base64url encoding at the same time to stretch the storage gain to the limit?
Suggest renaming "transient" - as the eventual meaning is that it is could be pruneable after checkpoints rather than transient at the current time.
Suggested alternative syntax for anchor file
{
"map_file": CAS_URI,
"writer_lock_id": OPTIONAL_LOCKING_VALUE,
"operations": {
"create": [
{
"file_ref": CAS_URI,
"suffix_data": { // Base64URL encoded
"delta_hash": DELTA_HASH,
"recovery_commitment": COMMITMENT_HASH
}
},
{...}
],
"recover": [
{
"file_ref": CAS_URI,
"did_suffix": SUFFIX_STRING,
"reveal_value": MULTIHASH_OF_JWK
},
{...}
],
"deactivate": [
{
"file_ref": CAS_URI,
"did_suffix": SUFFIX_STRING,
"reveal_value": MULTIHASH_OF_JWK
},
{...}
]
}
}
file_ref
could actually be a JSON pointer CAS_URI
*Feedback
I think enabling checkpoints and pruning is important, so I think a structure that enables that aspect is useful.
Just want to note that the current file structures already implicitly support the addition of a checkpoint/pruning mechanism. This is about reducing the minimum dataset required to run a light node by ~75+%.
I'm generally in favor of this proposal, but I'm a bit worried about how we go about implementing it.
Here is my proposal:
We inventory the set of features for which we believe we are shipping support for in spec v1.
We determine what level of testing is required to believe that the feature is supported in spec v1.
We create issues to ensure those tests exist in the reference implementation.
We close those issues when the tests exist.
We publish spec v1 and reference implementation and we bump to v1.1.
We open issues for the core set of features in v1.1 ( probably the same as v1).
We close those issues when we have tests that prove that they work.
We publish spec v1.1 and reference implementation.
Vendors that don't have production customers can choose to skip spec v1, and jump to v1.1... vendors who can't "wipe their production database" can use spec v1, until spec v1.1 is ready to migrate too.
We target SIP-1 to spec v1.1.
We need to be careful to have a stable, rigorous, and confidence building release process, and versioning system, and I think its dangerously confidence destroying to rewrite versions and refuse to publish, vs choosing to publish regular versions with clear changes, tests and documentation to support each release. (our reference implementation does a good job of this... we need to ensure the spec does as well).
@OR13 how about we cut an official version of the spec, as it stands now, to 0.1.0, and use this change as an opportunity to do a proper minor version bump of the spec in accordance with the version descriptions in the spec.
I'm fine as long as we cut a version before we attempt to implement a sip. ideally, we try and make it as clean a version as we can, by closing out any low hanging fruit before the cut.
it can be v0.1.0 and SIP-1 can target v0.2.0 or whatever... features should be planned to target versions...
Aside: are folks here OK if I do a PR to add this general SIP template as a start for that sort of thing? I was thinking to create a SIP directory with MD files in it that would render just like our specs do.
@tplooker I don't think the pointer URI to a place inside the linked file is worth it if we can do the same thing via a 0-byte alternative, given it degrades the primary goal of SIP 1. However, if we changed our mind about, we could always add it later in a way that Sidetree-based implementations could push out via a rather straightforward upgrade.
@troyronda and others: if we don't want to go with Transient, what are some names for the files that will be cyclically eliminated after checkpoint pruning occurs?
To further optimize the above proposal, we could remove an additional base64 encoding of suffix_data if we instead relied on using JCS to canonicalize the structure
lets take the encoding performance debate to https://github.com/decentralized-identity/sidetree/issues/781
any tests / proof for the "75%" reduction claim being made here?
@OR13 here's the test: the entries with proving data was 275 bytes, and the new size of the entries without proving data is 65 bytes, which is a reduction of 76.5% in the minimum required for a node to boot up and have a global index of all op entries.
^ nice test, you must code a lot ; )
I noticed the new file fields end with _file
, but chunk ends with _file_uri
e.g., retained_proving_file
vs chunk_file_uri
Should the chunk field be chunk_file
?
@csuwildcat Should reveal_value in recover, deactivate and update operations be just the hash of JWK instead of the multihash of the hash of the JWK? This way our check for operation commitment would stay the same: multihash(reveal_value) == operation commitment
Fully implemented.
Summary
By segregating the proving data contained in the operation entries currently housed in the Anchor File and Map File (for Recovery, Deactivate, and Update operations), it is possible to realize a rather dramatic ~75% reduction in the minimum dataset required to trustlessly resolve DIDs.
The effect of moving this data to a segregated Proving File is that the Anchor File and Map Files become lightweight, spam-protected operation indexes, allowing for deferred acquisition of Proving Data in a JIT fashion, for nodes of various configurations.
Motivation
These changes would make initialization of many node types faster, more efficient, and most importantly: operationally feasible for the average user-operator. Sustainable operation of nodes across consumer hardware is a key requirement for any decentralized network of this class, thus keeping network storage growth comfortably 'under the line' of the commodity storage technology cost curve and bandwidth growth curves is essential. While such curves lack precision, when one examines the trajectory of storage and bandwidth in reference to the waning cadence of the Kryder's Law and Edholm's Law doubling conjectures, it appears that a 2-3 terabytes per annum growth in the size of the minimum required dataset for a network is the top end of sustainability for a system that features peer-based replication of data and deferral of CPU intensive tasks until a JIT compilation/resolution phase.
Requirements
Technical Proposal
The primary technical changes center around moving proving data out of the Anchor File and Map File, leaving those files to act as bare minimum indexes that enable a node to have global awareness of possible operations for any DID in the system. The proposed changes include the addition of two new intermediary files between the Anchor and Chunk Files. All changes to the existing Anchor and Map Files, as well as the new Proving Files, are as follows:
Anchor File
The Anchor File would be modified in the following ways:
recover
anddeactivate
operation entries.update
operation entries of the Map File.create
operation across the spec to reflect the fact that thereveal_value
is the hash of the hash of the JWK value that is being committed to.recover
anddeactivate
operation entries to only include thedid_suffix
andreveal_value
properties. Thereveal_value
is the hash of the hash of the JWK in thesigned_data
object that was relocated to the Retained Proving File.Map File
The Map File would be modified in the following ways:
update
operation entries to only include thedid_suffix
andreveal_value
properties. Thereveal_value
is the hash of the hash of the JWK in the signed data object that was relocated to the Transient Proving File.Retained Proving File
The Retained Proving File will contain the following:
signed_data
portion of therecover
anddeactivate
operation entries that used to live in the Anchor File are now present in theoperations
object under their respective properties, and MUST be ordered in the same index order their corresponding entries are present in the Anchor File.Transient Proving File
The Transient Proving File will contain the following:
signed_data
portion of theupdate
operation entries that used to live in the Map File are now present in theoperations
object under their respective properties, and MUST be ordered in the same index order their corresponding entries are present in the Map File.Operation Data Changes