dominictarr / ssb-private-groups

MIT License
6 stars 1 forks source link

group ids #1

Open dominictarr opened 4 years ago

dominictarr commented 4 years ago

groups need an id and a key. maybe the group id should be the hash of the key, we havn't decided yet. can just make it so there is an opaque relationship between group id and group key, some sort of lookup.

https://github.com/dominictarr/ssb-private-groups/blob/master/index.js#L65

this is where retriving the group key should go.

todo: update this code to use the format @keks and I came up with

keks commented 4 years ago

Hang on, isn't the group ID the ID of the group create message?

dominictarr commented 4 years ago

I don't want to do it like that because that reveals who created the group. I'd rather have a completely opaque token, as a design principle. hmac(msg_id, key) for example

keks commented 4 years ago

Oh, right! I totally forgot about that one!

Hm, but with an opaque token would be an id, not a root. And it's not always possible to figure out you've really reached the end of a tangle.

How about we leave them as message ids and instead implement encryption of parts of the message? Currently, the key derivation tree is rather simple: we derive the read cap from the message key, and the body and header keys from the read cap. We already plan deriving keys for extensions from the message key and it's reasonable to also use this approach.

So the key derivation tree would look something like this

msg key
 |
 +--> read cap/key
 |     |
 |     +--> header key
 |     |
 |     +--> body key
 |
 +--> extensions key
       |
       +--> partial encryption key
             |
             +--> encryption key for path "tangles.community"

This way, it would be possible to hide parts of messages from being accessible just by having the read cap. Which begs the question: Which parts of the message should be hidden? I think from a privacy-perspective, the most important thing is to encrypt cypherlinks that imply group membership.

What do you think about this approach?

tschudin commented 4 years ago

TL;DR - jump to the last paragraph...

I don't want to redirect the discussion you are having, but wanted to tell that I have been experimenting with an even more opaque format (for a message tag that tells whether the message belongs to a certain group) than Dominic's, in fact hiding the group id alltogether for an outsider who is scanning the logs: [nonce=n, tag=hmac(n, grp_hkey), ciphertext=aes(msg_payload, n, grp_dkey)]

Which means that a message for a group can only be detected (filtered out) if you know the grp_hkey, i.e. you have to iterate over the hkeys you have collected so far.

Currently I use a random grp_hkey as the group's id, but this does not yield a unique and non-forgeable message that could serve as a root, which is what keks want.

But we can link hkey generation with a single message in the following way. The group creator would create a message [nonce=n0, tag=hmac(n0, derived_hkey), ciphertext=aes("init:random-seed", n0, derived_dkey)] where hkey and dkey are derived from that seed. No future group member can fake this message because they cannot come up with a correct seed that yields the group's hkey. The dkey can later rotate, but the hkey would be the group's stable id.

I think that this works even if you choose a message format that exposes the group id (which is what you want to do, right?): [grp=derived_hkey, ciphertext=nonce,aes("init:random-seed", nonce, derived_dkey)] where only those later admitted to the group can see who wrote that initial message, now serving as a root. Would that satisfy the requirements of both of you?

keks commented 4 years ago

@tschudin There may be a misunderstanding.

This is not about knowing which group the message is for. Instead, we keep track of the groups each user is in and try-decrypt all their message using the keys of the groups they are in. We discussed using tags to avoid trial decryption, but realized that computing the tag to check which group the message is in is not faster than just try-decrypting for all groups (at least with symmetric encryption).

This discussion is about the situation when someone gets entrusted the read cap of a message. Within the message payload, we have a transitive reduction style tangle. Traditionally, we root these at a message (otherwise the word "root" wouldn't make much sense). Dominic suggested to allow opaque identifiers for these tangles.

keks commented 4 years ago

My previous post may not have been too clarifying.

We want to prevent that if I entrust a private group message to an outsider, that they can infer the identity of other group members, or even learn which other private message were also part of the group. Since group messages are all inside a tangle, there will be cypherlinks to messages that are known to be in the group, and if we know who authored these messages, we know they are in the group.

keks commented 4 years ago

@dominictarr can I have some of your attention to move this forward? <3

dominictarr commented 4 years ago

@keks I wasn't suggesting using other than a message id for a tangle, but I was suggesting a group id, which I consider not to be a tangle. it just appears in the recipients field, and the the backend encrypts the message to that group. I guess it could be a tangle, but that wasn't how I was thinking about it.

@tschudin so in your proposal, a reply to the group is unlinkable to the group by someone who sees the plaintext but doesn't have the group key? because the reference contains a nonce it is always unique. I agree this is more opaque and I've considered schemes like this, but I think it's too expensive for database indexing. It means indexing messages is now messages*groups. I think that's gonna have too much impact on usability.

So a group id that's unlinkable to the creator, but the members use the same, but inside encrypted messages, feels like the reasonable compromise.

tschudin commented 4 years ago

@keks yes, I got the question wrong, thanks for clarifying. Encrypting the group-internal links would be possible to add to the suggested schema: simple derive a cipherlink-key in parallel to the datakey. This would give zero database overhead to group members. But as Dominic points out, the indexing for the whole scheme may be unacceptable - so read this as "let's have a cipherlink key". I reckon that you suggested this for "partial encryption key"? With datakey rekeying (and parallel cypherlinkkey rekyeing), past group members can't learn about the feedID in new cipherlinks even if they get hold of the cleartext content of a future message. I like this and I wonder whether such a cipherlink key shouldn't be in parallel to your header and body keys, instead of the extensions?

@dominictarr Re the messages*groups complexity: Shouldn't we take a more "incremental" view on this? Typically one does not reindex the whole log.offset. There is a (time) locality in what a client ingested from the feeds, like a spotlight on the most recent log additions that need to be displayed ASAP. What is outside this spotlight can be put into a background job that catches up between the spotlight work and the previous indexing frontier. (Hm, vague language - could I convey some sense?)

dominictarr commented 4 years ago

oh yeah, I ment to say: a good choice for the nonce is the previous message id, because it's unkown unique (except for forks, but those break the protocol anyway)

@tschudin but to know that you have everything you still need to index everything. indexing in chronological order is simple because you only need to keep track of a single number and you know you'll have everything. If you are indexing in a random order it gets a lot more complicated...

Now you are proposing a system that is more computationally expensive AND more complicated to implement. how would you keep track of what has been indexed by what I guess a bitfield, or compressed bit field, would be best?

If you want to convince me that this method is worthwhile you also need to also describe how the indexing will work for it because it doesn't fit into the current index model.