decentralized-identity / ion

The Identity Overlay Network (ION) is a DID Method implementation using the Sidetree protocol atop Bitcoin
https://identity.foundation/ion
Apache License 2.0
1.23k stars 168 forks source link

Add a mechanism for self-declaring a DID's entity type #77

Open csuwildcat opened 4 years ago

csuwildcat commented 4 years ago

Given we will have DIDs for humans, organizations, machines, and other things, it would be helpful to have an optional mechanism by which DID creators can self-declare what type of thing a DID represents. One way to do this could be to add a type field to the create operation Anchor File section with a couple bytes of space for a value that mapped to entity types. That's just one idea on how to do this, but this would really help in being able to have a general grouping of different DID types in the system.

OR13 commented 4 years ago

Can't be trusted... anyone can write dids of type "make you waste time"....there is no write side protection in sidetree.

You can build an index based on your own surveillance / visibility and bucket dids / pre cache them... we do this for dids when we know we might loose connectivity... but this is 100% not a sidetree specific thing...

For example:

Lets say I know did:exampe:123 is actually Azure AD Tenant 456, because I can see that... I build a table for Azure AD Tenant 456 that maps their AD Entity types to DIDs.... I don't publish that mapping on a ledger... because thats customer information...

That tenant can ask for did lists by type:

POST /tenant/456/dids { type: organization } => customer specific data that nobody should be able to learn from crawling public resources like bitcoin or ipfs....

Then they might locally resolve all those dids before starting their run through an internet denied environment... When they encounter an entity with a DID, they can authenticate it from cached data, even without internet....

TL;DR; this would be a terrible feature to implement in sidetree, but it's a great feature to implement in an sdk, or other services... we do this already in our enterprise agent.

OR13 commented 4 years ago

categories of "type" are metadata...

"we kill people based on metadata."

we won't be able to scrub timing information from sidetree ever... thats metadata as well...

If you are really worried, you should pre-allocate a really large number of DIDs and keep them all alive, and try and disconnect things like updates from when you are required to make them...

For example, if you add healthcare.edv.example.com to your did document the same day you get onboarded to a new clinic... its now really easy to see when that clinic gets new patients...

metadata is poison, we should be very careful to minimize it.

csuwildcat commented 4 years ago

@OR13 let's talk - I can provide some info on what the issues/details are that make this a significant need at this level.

csuwildcat commented 4 years ago

Another way to do this, as Orie noted on a call, would be to reserve the first two characters of a DID suffix string for this type of category indicator that a DID controller could optionally use, but then it means that a classification may be applied without explicit action, which I don't like. Separating it could cost a couple extra bytes, but might be worth it.

OR13 commented 4 years ago

@kdenhartog @troyronda add your comments to this ticket.

OR13 commented 4 years ago

@csuwildcat to add language to the spec about "implementers may add other properties that mutate what the didUniqueSuffix will be".

kdenhartog commented 4 years ago

My opinion on this is that I'm not opposed to it, but I don't think this level of classification belongs at this layer. We likely wouldn't want to support it's usage because of the metadata concerns described above, but I don't see blaring concerns with this. I do think we're going to want the didUniqueSuffix to be clearly identifiable though if it's going to support this. Also might want to check with @SmithSamuelM to make sure we aren't breaking self certifying identifier property in any way.

csuwildcat commented 4 years ago

It's definitely not breaking anything about self-certification, given any additional values in the Suffix Data are implicitly part of the ID itself. (aside: Sidetree was the first self-certifying type of ID before that concept existed in the DID ecosystem, besides arguably did:key's simple linkage, so I'm confident on this point). Nothing about this specific additional property is going in the spec (it just says implementations may define additional properties), so I don't think further discussion about a particular additional property an implementation may add is valuable for Sidetree.

On Wed, May 13, 2020, 10:06 PM Kyle Den Hartog notifications@github.com wrote:

My opinion on this is that I'm not opposed to it, but I don't think this level of classification belongs at this layer. We likely wouldn't want to support it's usage because of the metadata concerns described above, but I don't see blaring concerns with this. I do think we're going to want the didUniqueSuffix to be clearly identifiable though if it's going to support this. Also might want to check with @SmithSamuelM https://github.com/SmithSamuelM to make sure we aren't breaking self certifying identifier property in any way.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/decentralized-identity/sidetree/issues/521#issuecomment-628388975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABAFSSNZT5ADP7O77NAXB3RRN35RANCNFSM4L2YIGHQ .

SmithSamuelM commented 4 years ago

Not sure what you mean by suffix. After the did method there is the unique name string and then there may any number of colon separated strings. But forcing an entity type semantic on one of those colon separated strings is not likely an acceptable mechanism. Also forcing an entity type in the unique name string is bad for privacy. Some optional did metadata Is likely better.

@csuwildcat. FYI. My original open reputation white paper in early 2015 advocated the use of self-certifying IDs for decentralized identity. I am pretty sure that pre dates Sidetree by some time ;) likewise the Evernym white paper I wrote in early 2016I advocates for self-certifying identifiers. ;). Indeed my original intent in those white papers was that all decentralized identifiers should be self-certifying. Unfortunately the community around the nascent DID standard in late 2016 dropped tself-certifying as a requirement in order to be more open to any type of did method . Those advocating DIDs at that time may not have fully appreciated what Self-certifying meant. This decision to drop self-certifying as a requirement carried with it security risks that were also likely not appreciated at the time. I have spent a lot of effort lately re-evangelizing self-certifying Identifiers but in no way is it a recent idea in the did ecosystem. Indeed it was the incepting idea for the DID ecosystem albeit one that was largely ignored or misunderstood until recently.

csuwildcat commented 4 years ago

The proposal for the type string, which would not even happen in Sidetree itself, was a 4 byte value to be included in the Suffix Data object that makes up the root of trust values for the IDs, not some additional segment in the DID URI. Suffix data is the immutable, self-certifying component of the URI, thus whatever goes into it is inherently self-certifying. All of this is detailed in the DID URI composition section of the spec: https://identity.foundation/sidetree/spec/#did-uri-composition

Sidetree was first developed in 2017, so I guess the concept of a 'self-certifying' ID existed before that, but was unknown to me, as I never heard about it or read anything about it while I was working on Sidetree. I made Sidetree 'self-certifying' without even knowing it because it just seemed obvious you would want the feature in an ID system (I am honestly stunned to hear this was not plainly obvious to anyone who worked on decentralized identity).

csuwildcat commented 4 years ago

I remember thinking in 2017: "I should probably just make this sucker the hash of the initial state, because then you can tie a lineage of deltas to it deterministically" <-- how is that not the point of all DID systems? Are there actually Methods that don't do this? If so, I feel like we should we tell anyone who made a DID Method like that to burn it with fire.

csuwildcat commented 4 years ago

I just helped this whole situation out for everyone by moving it here, where it should have been to begin with. Now we can actually talk about this, if we want, because it would be an implementation-specific choice, not some Sidetree-required value.

I should provide this preface, in case this it was not clear: the 4 byte string is not for humans, the string is for non-sentient objects and entities, like companies and farm equipment, for example.

If you think it is bad for a DID to include a codified 4 byte string in its inception payload that asserts "I am a company". Please let me know what your concern is, in precise detail. Also, if we assume there will be maybe 50 million DIDs created with an "I am a company" type assertion out of say, 50 billion, please let me know how you plan to overcome the massive cost difference between having to compose, resolve, and do billions of HTTP requests to service endpoints for 50 billion IDs, vs some subset in the tens of millions, many orders of magnitude less. I would assert the costs difference is astronomical, prohibitive, and would implicitly result in some secondary, more centralized registration system that does exactly the same thing. Please show your calculations for overcoming this hurdle, and describe how you will do so while keeping the registry system decentralized.

(aside: the supreme irony of this is that detractors are implicitly arguing you must register your IDs with some secondary, rather centralized entities who keep directories for things like the business yellow pages, code package managers, etc. This effectively implies that Active Directory, NPM, etc., should own these as businesses, instead of it being decentralized and feasible for anyone to compose. Please just ask yourselves: "Do I really want to hand the entire ecosystem of type registries to be dominated by an incumbent who literally does this today?" I find myself in the strange position of being employed by that company and arguing "No", while others are arguing that we should - strange times)

csuwildcat commented 4 years ago

Basically, people on this thread be like: "Here, Corporate Galadriel, take this ring - I am but a mere Decentralized Hobbit, so I want you, a mighty force of centralized power, to take it off my hands". Needless to say, this is not what I was expecting ¯_(ツ)_/¯ https://www.youtube.com/watch?time_continue=210&v=WeQDTj1UllA

cboscolo commented 4 years ago

@csuwildcat I am in favor of encoding type in the DID. One of our early experiments was to use a single character in the DID name did:foo:o:uniquevalue for an org. We had three types, Organization, Person, Thing.

OR13 commented 4 years ago

Since create payloads are not signed, can't this tagging system be used to label all DIDs being created by a certain vendor? Isn't that the exact use case for this?

As a service provider I mutate my customers create payloads to tag and track them :)

I guess you end up with a different DID though, so maybe this doesn't matter.

SmithSamuelM commented 4 years ago

@csuwildcat

"(I am honestly stunned to hear this was not plainly obvious to anyone who worked on decentralized identity)"

I am likewise stunned whenever I see a proposal for a DID method that is not self-certifying. =)

I agree that one good place to put this sort of controller type string in in the incepting data for the identifier. In KERI parlance the inception event (statement) for the identifier includes a data element that is an array of key value pairs just for these sorts of things. Because any entry in the data element is optional, there are no privacy concerns vis a vis a reserved mandatory controller type element in the name string. With a self-content-addressing self-certifying identifier, this data element along with everything else in the inception event becomes strongly bound to the identifier. This removes any security exploits and makes it every bit as strong as putting it in the identifier itself. A KERI compatible DID method would make the inception statement available as part the metadata that provides proof of control authority. So an optional controller type would come for free as part of KERI compatible DID. Its bundled in the proof of control authority metadata. This includes configuration of infrastructure in order to prevent attacks that substitute infrastructure. IMHO the lack of infrastructure configuration as a first order property of inception is the main security flaw in all other identity systems. The generalization (which sidetree shares in many ways) is that self-certifying identifiers may be verifiably bound to their incepting configuration data. This means that any other attributes including controller type that are declared in the incepting configuration statement are securely verifiable in a self-contained manner along with proof of control authority. This makes the identifier plus configuration truly portable and hence truly self-sovereign.

A caveat. The inception data must include the public key(s) of the controlling key-pairs to be self-certifying when the identifier is derived from a hash of the inception data. A hash of the inception data is not itself self-certifying. Its the unique binding to a signing key-pair(s) that makes it self-certifying.

csuwildcat commented 4 years ago

A caveat. The inception data must include the public key(s) of the controlling key-pairs to be self-certifying when the identifier is derived from a hash of the inception data. A hash of the inception data is not itself self-certifying. Its the unique binding to a signing key-pair(s) that makes it self-certifying.

I would ask that people read the values included in the initial state of the Suffix Data (which includes all keys and literally every other piece of initial state) that makes up the hashed DID Suffix, if anyone on the thread is not aware of the self-certifying nature of Sidetree DIDs: https://identity.foundation/sidetree/spec/#did-uri-composition

OR13 commented 4 years ago

Now that this issue is in ION, I'm in favor of letting the user define some arbitrary bytes that get prefixed or postfixed to the didUniqueSuffix.... why? because AFAIK, no one else has done this so far, and its just like mining for a vanity didUniqueSuffix... the only difference is that mining has a computational cost, and letting the user specify it, has 0 computational cost, so its much easier for an attacker to spam the network with "repositories" or "iot" devices... which are really just a guy in his basement, messing with indexes...

Lets see what happens, if its useful... great! If its just a target for data poisoning that gets exploited... ION DIDs are just a couple bytes longer... no real harm done.

OR13 commented 4 years ago

AFAIK, Sidetree Create Payloads lead to self certifying identifiers... the payload data (which includes the authoritative keys) is hashed to produce the identifier, the inception event is witnessed by a ledger.

OR13 commented 4 years ago

Also, since we are talking about privacy / tracking concerns...

There are couple different levels of surveillance / data mining attacks of DIDs.

Level 0 - A Database of all DIDs in a specific registry (not possible to do with did:peer, but possible to observe from every other did system).

Level 1 - A Database of all DID Documents in a specific registry

Level 2 - A Database of DID categories (like Type)....

Level 3 - A Database of DID relationships (like sameAs github user, linked in user, mac address, etc...).

This proposal is to allow "Level 2" to be built very quickly from the DIDs without the DID Documents or other network requests... and it comes with the caveat that the index, could be poisoned, since anyone can claim to be an"iot" device, or a "person" or a "repository".... I'm pretty sure that people will generally not trust this tag in isolation in much the same way that I never trust an NPM module without reviewing contributors, source, release history, etc...

SmithSamuelM commented 4 years ago

@OR13

AFAIK, Sidetree Create Payloads lead to self certifying identifiers... the payload data (which includes the authoritative keys) is hashed to produce the identifier, the inception event is witnessed by a ledger.

That sounds right.

csuwildcat commented 4 years ago

Just want to clarify, Orie: "Now that this issue is in ION, I'm in favor of letting the user define some arbitrary bytes that get prefixed or postfixed to the didUniqueSuffix.... why?" - this is actually within the DID Unique Suffix hash, because it's inside the Suffix Data (initial state data). For example:

{
  "type": "E8c0",
  "delta_hash": DELTA_HASH,
  "recovery_key": JWK_OBJECT,
  "recovery_commitment": COMMITMENT_HASH
}
SmithSamuelM commented 4 years ago

My concern is that it be manditory. If its mandatory everyone carries the cost of those bytes in their identifiers. Whereas optional in configuration data does not carry that cost. I understand the desire to not have newtwork requests for configuration data. Optional values in : : separated string after the name string are also fine as long as they are optional. The problem is that every time a proposal has come up to make something mandatory in the : strings it conflicts with someone else's use case. As it stands now you are free to add a entity type in : there already. Its only if you want mandatory interoperability that it becomes an issue. So either there is no need for this suggestion (ie already optionally allowed) or you want mandatory interoperability which is not likely to occur. For example:

did:meth:namestring:type/path?query#fragment

One can already do this above.

SmithSamuelM commented 4 years ago

In order to avoid confusion I suggest we use the latest syntax from the DID 1.0 spec

did                = "did:" method-name ":" method-specific-id
method-name        = 1*method-char
method-char        = %x61-7A / DIGIT
method-specific-id = *( ":" *idchar ) 1*idchar
idchar             = ALPHA / DIGIT / "." / "-" / "_"

The method-specific-id may include any number of colon separated strings. Any method implementer is free to define what the semantics of those strings are. These are part of the DID itself not the DID URL . So in the language of the DID spec are you proposing a custom colon separated controller/entity type as one of the colon separated elements of the method-specific-id? If so its already allowed for your method. If you are proposing that the colon separated string be universal across all methods then that will not fly. The question is. Are you looking for cross method interoperability or just something within a method? Its not clear to me what the ask is. @csuwildcat not sure what you mean by suffix in the context of a DID as its not a term used in the spec language.

OR13 commented 4 years ago

Sidetree uses the label didUniqueSuffix to refer to what is normatively defined in did core as idchar...

I misinterpreted the proposal (because it lacked a concrete example), its essentially to have the DID looks like:

did:ion:EiCVJETTBWS4W6fbbjfXTdKSX-6v0u6vUZiCbqBZvopG_9...

where idchar is EiCVJETTBWS4W6fbbjfXTdKSX-6v0u6vUZiCbqBZvopG_9, and where its is constructed approximately as:

SHA-256 of JSON.stringify of createPayload... where createPayload is:

{
  "type": "E8c0",
  "delta_hash": DELTA_HASH,
  "recovery_key": JWK_OBJECT,
  "recovery_commitment": COMMITMENT_HASH
}

You can see how changing type would change idchar.

csuwildcat commented 4 years ago

Yes, this is basically 4 bytes for an OPTIONAL flag that immutably designates a NON-HUMAN ONLY classification to a DID. An example of this would be: you want to create a decentralized version of NPM, so you generate a DID with the inception-bound immutable type value of E8c0 (I made that string up, so let's just assume it is the type string for a code module). Once registered on the substrate, anyone can run a node and index all the anchor files, meaning a light node would have the feasible ability to at least group IDs together for indexing and crawling. For this NPM example, you would resolve all the ones that said "I am a code module!" and find their repos to digest whatever code module-related data you wanted to render in an NPM-like registry UI. If you don't have the ability to do this as a light node, I would argue the implied reality would have to implement a far more complex, costly mechanism that did the same thing, and is probably going to be more centralized and less censorship resistant.

peacekeeper commented 4 years ago

If anyone is interested in some history, when @talltree and I and others worked on XRI and then XDI, this was expressed by the first character of the identifier itself, e.g.:

=markus -- = means person +microsoft -- + means organization or group #car -- # means concept *mysensor123 -- * means thing

This was semi-enforced by standards and legal agreements, but ultimately couldn't be trusted of course. I think we also used ^ for unknown/undeclared types.

Those identifiers were not URIs. They were not decentralized either. If you want to register one, contact sales@danubetech.com (special discount today - buy two for the price of one!).

See here and here if you want to get a headache. :)

kdenhartog commented 4 years ago

Following this issue still. I'm becoming a bit less skeptical now that I'm starting to understand how it works. I can see the value you're after here especially if it helps in some way categorize dids. The concern I see with this proposed solution is that these types are self asserted, therefore the data is going to be noisy. In that sense I mean, the data can be manipulated and not be very valuable to those wishing to take action based on this data such as a crawler indexing dids. Rather, I would think we'd want a way to address this which gives the data more reliability and integrity and design a portion to the stack explicitly for this. What that looks like I'm not sure yet. This may actually be the best way to do it even. Given how cheap it is to handle it at this layer, I'm not opposed to trying it here and if it doesn't fit or doesn't make sense it we can move this functionality into a layer of it's own and make this a dead feature at the ledger layer.

OR13 commented 4 years ago

They are not even self asserted... create payloads are not signed.

OR13 commented 3 years ago

discussed on the DID WG Topic call today.

csuwildcat commented 3 years ago

Updating with first round of type values this month.