Pubkey rotation mechanisms, threats, and use cases

ethanjli commented 3 years ago

This is such a cool project, and I'm really excited to see how this develops! The planned design for identity and re-homing is especially interesting to me.

As I understand it, the design document's mechanism for pubkey rotation is that a new database (I'll call it V2) appears and declares itself to be the successor of the previous database (I'll call it V1), and the DNS-ID authority corroborates this declaration, presumably at the request of the entity which had write access to V1. After the declaration and the corroboration, followers can trust V2.

Use case: this sounds like it could be useful for revoking V1 if its private key is accidentally destroyed or compromised, because followers can just ignore new records in V1 once V2 appears and the DNS records are updated.
Threat: would it be possible for someone with control of the DNS-ID authority to create V2 and, by manipulating the DNS records, unilaterally mislead followers into thinking that V1 is transitioned/revoked?
Possible countermeasure: perhaps followers could require that V1 update itself with a declaration that V2 is its successor; so V1 must corroborate the declarations of V2 and the DNS-ID authority. However, this would make the pubkey rotation mechanism ineffective against a stolen private key. Maybe it could still be useful for accidental destruction of V1's private key: V2 can be created around the same time as V1, and V1 can declare ahead of time that V2 is its successor, but V2 doesn't yet declare that it is V1's successor. Then, after V1's private key is lost (but assuming V2's private key hasn't been lost), V2 can declare itself as the successor, and the DNS-ID authority can corroborate this too. If instead V1 was not lost but V2 became compromised, then V1 could simply declare some other database as its successor.

We could also generalize this mechanism: instead of (or in addition to) using DNS records as corroboration, followers could require corroboration from other hypercores (which would be writable by other users or by other entities which can act as custodians). V1 could declare its trust in these other pubkeys ahead of time, and followers can trust V2 once some threshold number of V1's trusted custodians declares that V1 is succeeded by V2. Perhaps that threshold could also be specified by V1 ahead of time.

I'm really new to all of this so I've probably missed some important information, and I'd love to hear thoughts about any of this!

pfrazee commented 3 years ago

Thanks for the thoughtful writeup!

The weakness you're identifying is that the DNS-ID provider is a unilateral identity authority. The generalization is right; you need to apply some kind of "added authorities" model which confirm a change. This could be other cores that the user makes, or the cores of other users.

Users rarely take precautions so this is mostly a UX issue. Questions to explore are:

Is there a way to make this process seem worthwhile, simple, and interesting, so that users don't perceive it as a chore?
Is it possible that the precautions are only really needed for accounts with a higher security need?
Are there other, better mitigations than adding authorities? For instance, could we make domain-names a viable form of identifier (pfrazee.com rather than pfrazee@server.com)? Would such a thing be more effective? etc

ethanjli commented 3 years ago

This framing makes a lot of sense to me!

Maybe another design parameter is how followers react to a DNS-ID change, and if adjusting that might reduce the need for advance planning and precautions. As a thought experiment, what if the policy of followers was to interpret the DNS-ID provider's announcement not as "V2 is now the successor, so I will update my records", but rather as "I now have a signal that V1's identity is in question, so I need to decide whether to perform additional verification, and how to update my records"?

For example, if a petnames system is used, immediate contacts of the person who moved to a new pubkey could confirm this out-of-band and update and publish their petnames for the old pubkey and the new pubkey. The DNS-ID and the self-proposed names for both pubkeys could also be shown as petnames under this scheme. I imagine this would make pubkey switchover be more gradual and partial, and the identity authority would be more diffuse and multilateral, and subject to individual and collective judgement. However, I haven't understood ctzn's design deeply enough to identify other implications of a petnames system, e.g. how it affects the relationships among databases, or whether it's even feasible. And probably there are other ways to provide signals for people deciding how to update their records.

ethanjli commented 3 years ago

tl;dr of incoming wall of text and rubber-ducking:

If URLs are constructed like https://ctznry.com/ethanjli@ctzn.one and shown like ethanjli@ctzn.one on the user profile, how important are impersonation risks of someone else registering as ethanjli@ctzn.one after a rehoming, and how important is it to design mitigations? Or registering as something like ethanjIi@ctzn.one (with capital-i instead of lowercase-L in the name) or ethanjli@otzn.one?
Having community-curated name indices and allowing users to select which indices they will use is an interesting possibility which could address the unilateral identity authority problem and, to some extent, mitigate the above risks. It adds some complexity but maybe it could be fun and interesting enough for users? I'm not sure, and I think we can adjust simplicity vs. flexibility while still solving the unilateral authority problem, which is why I have such a lengthy analysis below.
If the DNS-ID provider is independent from the server with write access to the database, and the URLs of both servers are stored in the database, would that reduce the number of situations where someone would want to change the DNS-ID? By separating responsibilities, then the DNS-ID provider only exists to give names for URLs and displaying to users, while everything else goes to the database-writing server listed in the database resolved by the DNS-ID provider.

--

I thought a bit more about the properties of different schemes for specifying identity, for the use case where private keys live on third-party servers - which is probably the only really usable approach for the near future - and I think there's a centralized-to-distributed spectrum of approaches between the extremes explored in the comments above:

The Extremes

1: strong authority, one authority per database: this is the DNS-ID system currently described in design.md. The only way to respond to identity theft via a compromised server is to fork the identity and resolve the competing claims of identity socially (i.e. by talking to followers from the new account). And other servers all depend on the DNS-ID provider - so if I want to rehome from ctzn.one to ctzn.two, I'd need to make sure that anyone who posted a link to me or any post I made using https://ctznry.com/lietk12@ctzn.one, https://hikers.ctzn/lietk12@ctzn.one, etc., should update their link to use https://ctznry.com/lietk12@ctzn.two, https://hikers.ctzn/lietk12@ctzn.two, etc., and I'd need ctzn.one or ctznry.com, hikers.ctzn, etc., to prevent anyone else from re-registering as lietk12@ctzn.one and impersonating me.

2: strong authority, more authorities per database but only one authority in practice: a database self-declares the "added authorities" who confirm its identity, and followers use that list as their source of truth. But the server can always be made to change the database's list of authorities by a malicious actor. So this scheme has the same consequences if the server is compromised, at the cost of more complexity. (And even if the private key were on the client, there are the UX questions about assigning authorities)

3: very weak authorities, very many authorities shared very locally: every person is their own local identity authority - this is the petnames system. If an identity needs to be changed for whatever reason (a private key is destroyed or leaked, or a pubkey is rotated under normal circumstances), a person needs to notify the people around them who trust them, and then have those people announce this. Identities are always resolved socially, not just when there are competing claims. I think dealing with petnames would feel like a chore if we rely mainly on people to curate databases of petnames and build consensus all the time. This scheme may also have ethical concerns, e.g. if someone changes names, the petnames assigned to them could be used to deadname them, even if they create an entirely new profile.

An In-Between Approach

4: socially-determined strength & number of authorities, shared across servers (probably follows a power law-esque curve): every person chooses a set of subscriptions to indices of names (i.e. directories of people and communities) curated by various communities. This is a different take on the petnames system, with a more dynamic "interplay between [server] authority and individual authority" in naming databases than approaches 1 or 3, because it introduces the possibility of collectively moderating usernames and forking based on username registry:

Each community has one or more indices listing the name(s) of each pubkey registered in the index, and the same pubkey may be on multiple servers. Different indices may have different policies for trusting name registrations or updates, or naming rules (e.g. regular usernames, display names, or heroku-style random names like cloudy-wombat-12853); the community just curates each index, a server may host it, and then other people decide how credible/useful they are by either subscribing to them or not. A community could have multiple indices for different levels of confidence in names (cf. Ubuntu's main, universe, restricted, and multiverse package indices).
When a user joins a community, it could auto-suggest a few indices to subscribe to. Users subscribe to the name indices they trust and care about, and this process could contribute to a feeling of realizing the existence of people around them as they are introduced to more names/communities, at a self-set pace.
If a user only subscribes to a big, centralized community's default name indices and never feels a need to subscribe to other indices, that's fine too; the door's always open. But for people who start their own small or special-interest communities, I think there's value in being able to build a local shared namespace while reusing the same user profile on any server.
When a user/community database is shown to a user, the user will see all registered names for the database's pubkey, from the name indices which the user is subscribed to. Maybe they could also check the pubkey's registered names from other known but untrusted indices, e.g. indices trusted by other users from their communities.
If a user/community database isn't on any of a user's subscribed indices, the database's self-proposed name, such as a DNS-ID and a display name, is shown but marked as untrusted/unknown. The database's "profile page" could also list some indices where it claims to be registered, and users can decide whether to subscribe to those.

I think this could be flexible and reasonably interesting and easy for users to get started with, while also formalizing a mechanism to socially resolve identity. It reminds me of Cabal's subjective moderation system, but with less granularity, and for naming rather than ignoring:

If I never interact with anyone from another community (which is probably true when someone first joins ctzn), I don't need to worry about name indices. And I can very easily see who's an outsider.
If someone isn't on many of the indices I trust, I will be shown this so I can make an informed decision. If I hear that an index I subscribe to is acting suspiciously, I can unsubscribe or post about it.
People online will always need to weigh if someone is who they claim to be, so we might as well make that process transparent in the user experience (maybe?). Usually the stakes are very low, and checking/unchecking a box to subscribe to a name index or hovering over a name to see someone's other registered names feels like a proportional amount of opt-in effort. But if e.g. someone's trying to do phishing through impersonation on a DNS-ID abandoned by rehoming, then the name index and in-group vs. out-of-group implications can be very useful ambient warnings.
When I rotate pubkeys, I need to change registration on all indices where it's worth the effort to ensure that other people can verify that the same person is behind two pubkeys. The name index could record this as a pubkey redirect and the server would render other records accordingly.
People would discover me mainly through name indices. Then the DNS-ID provider listed on my database is only ever displayed as a fallback self-declared name and in URLs, and it has no other function (assuming the DNS-ID provider is separate from the server which holds my private keys and writes to my database for me). So rehoming may be less important and only very rarely useful (e.g. if the DNS-ID provider stops running).

More In-Between Approaches

Comparing approaches, I think approach 1 is basically like approach 4 with the following constraints added:

each server shares the same two name indices: the DNS IDs of the databases, and the index of display names. Both are just looked up from databases, rather than being stored on their own. All together, these are globally shared, singleton (distributed) indices.
DNS-ID providers and servers with write access to databases together have unilateral power over the names users see.
every person automatically trusts/subscribes to their server's name indices, and there's no unsubscription. So communities can't make+use their own name indices.
a database can only associate itself with at most one DNS-ID name index and one display name index. No other names or name indices.

Approach 4 introduces complexity in the usability (e.g. if a user subscribes to 100 name indices, how do names get prioritized and displayed?), but it could support the village scale of interactions within better, e.g. a group of friends making a community and having their own fun nicknames just for each other. If we want to add even more flexibility/forkability/independence of name indices (e.g. users can make private name indices, or they can also make public name indices), that takes us closer to approach 3. Increasing constraints (e.g. only servers can make name indices) could streamline the user experience more vs. approach 4 but centralizes responsibility, e.g. onto servers. In any case, it looks like we can tune the dial by adding constraints or relaxing constraints on the administration of name indices.

ethanjli commented 3 years ago

During the CTZN livestream for April 21 around timestamp 05:33:40, @pfrazee shared his thoughts on identity authority in a discussion with @redsolver. Paul's apparently had this conversation many times and people keep bringing up the same things 😅. So, since there'll probably be more people in the future wanting the same discussion, here's my summary of what Paul said:

So far, the proposed solutions Paul's heard for removing hierarchy of authority introduce too many other issues. For example, if a system just uses personal petnames and there are no global names, then there are no global IDs; then linking things by names becomes unusable. So there's a need for a uniform identification scheme which is global to all interactions among a set of users.
Profiles do have globally unique pubkeys, but people need shortnames for user IDs. Relying on pubkeys as the primary user identifiers prevents URLs from being shared easily out-of-band (e.g. saying the URL of your own profile out loud to other people), which is a big problem.
CTZN's approach is to use hierarchy of authority but provide other mechanisms to enable recovery when that fails. This is a hybrid approach using DNS-IDs as a global naming scheme for user IDs, combined with records of name mappings distributed across the social graph where every published record (e.g. comments) lists both the user ID and the database URL. The latter mechanism enables search. So CTZN has a system of two overlapping identifiers, which should allow resolving who someone was at some point if the DNS-ID gets lost, which is needed to transfer an identity.
TBD: how seamless should rehoming be? Is it good enough to make a new account on a new server and point to the old account and just announce a transition? Or is it necessary to be able to magically transfer an account between homeservers?
It's not clear that the benefits of blockchain-based domain IDs (which allows user IDs to persist beyond an individual service) outweigh the issues (requiring users to pay for IDs) compared to DNS-IDs.
Paul's considering letting users register domain names as their user IDs, as an optional alternative to the email address-style user ID format.
One thing which is interesting about Keybase's approach is that it separates identity providers from hosting providers - this approach might make rehoming a bit more straightforward. Paul will think about this more.

I basically agree with what Paul says, and I propose closing this issue as out of the scope of the core of CTZN, due to the plan for frontend customization. I think what I had missed in my comments above was that I thought of authority over profile names and authority over profile authenticity as the same problem (authority over "identity"), when they're actually different problems.

I'd love to be able to have my own domain name as my user ID, without having to host all my databases. But I agree that domain names as user IDs is a niche use case, because users would have to pay for a domain name.
I really like the idea of separating DNS-ID providers from db hosts, even more than in my previous comment. If I'm unhappy with the uptime of a db host (or if they change their pricing policy, or whatever) and want to change to a different host, I only have to do a pubkey rotation instead of also having to migrate away from the DNS-ID everyone previously knew me as. This is a big problem I have with Mastodon, where if I need to change the server hosting my data then I can't keep my name. To me, this seems like a big win: even if changing names is hard, maybe changing hosts can be seamless.
I agree with the need for shortnames so that profiles are globally shareable & findable independent of the social graph (and without forcing users to run a search, which, based on my experience with searching people by their display name on Facebook, isn't even reliable), and DNS-IDs are a good approach for this. This need exists whether or not other naming systems (e.g. petnames or display names) are available.
Whether or not DNS-IDs or other globally unique names are used, there needs to be a way to look at the social graph and/or do some searching to verify that a profile, after being resolved from some name, actually belongs to the person it claims to be. This acts as a mitigation for hacked servers, catfishing, etc., and it's a separate problem from resolving names into profiles. Right now checking the social graph looks like inspecting a profile's followers list and interactions - if they haven't hidden those things with a custom profile. Is that sufficient? Idk. My above proposal, of community-published lists of users/communities whose profiles they vouch for in some way, would be an opt-in way to structure social information about a profile's authenticity and subject it to curation/moderation. Would it work and be useful? Idk. But it wouldn't replace DNS-IDs, and that's fine because they're trying to solve different problems. Could we get the best of both worlds by combining DNS-IDs for locating profiles (solving the discovery problem) with easy-to-use search or frontend features to contextualize the located profiles (providing tools for the authenticity problem)?
If frontend customization supports annotating profiles with data from custom community-published indices or from data currently in the social graph, then functionality like what I've proposed doesn't have to be baked in to CTZN, which leaves room for flexibility and experimentation. Then the issue of supporting verification of a profile's authenticity (for any number of possible reasons beyond a compromised DNS-ID provider) could be in the scope of frontend customization + new schemas + communities, rather than the core of CTZN. Maybe this is the right approach, since all verification proposals are still unproven. If so, should we close this issue?

bluelinklabs / ctzn