freedomofpress / securedrop-protocol

Research and proof of concept to develop the next SecureDrop with end to end encryption.
GNU Affero General Public License v3.0
47 stars 1 forks source link

Decide the deniability/authenticity requirements for the message encryption #30

Open lsd-cat opened 11 months ago

lsd-cat commented 11 months ago

Currently, whether we want the messages to be signed/authenticated or instead repudiable is not well specified or understood.

In the specification, we define the message m to be encrypted using symmetric authenticated encryption using as key the output of k = KDF(DH(MESK,JEPK)) == KDF(DH(JESK, MEPK)).

As such there are no guarantees about the sender of a message. What is happening in the libsodium-only branch, is that we use nacl.public.Box https://github.com/freedomofpress/securedrop-poc/blob/2f32d0b5aa3a8bf85f94af57b2970af4c4c7a8e1/commons.py#L108

That by documentation should be repudiable unless explicitly signed[1].

We can imagine that we want some asymmetry here as well: Journalists should send non-repudiable authenticated messages while sources should keep the deniability and repudiability as a priority. A good hint at this came from @mmaker to look into borrowing a part of X3DH from Signal and see how adding long term identity keys to the key could affect these properties. See [2] for research on the actual deniability of X3DH.

[1] - https://pynacl.readthedocs.io/en/latest/public/ [2] - https://eprint.iacr.org/2021/642.pdf

lsd-cat commented 10 months ago

We have two main issues regarding ciphertexts and per-message ephemeral public-keys regarding integrity and authenticity.

The attack that @mmaker highlighted in https://github.com/freedomofpress/securedrop-poc/issues/31 is due to the fact that message tuples on server are not cryptographically bound together and thus the server can act dishonestly and serve a different message and public-key during the fetching phase separately from what was stored in the database. This allows two potential scenarios: (a) message replay and (b) message forging. In both cases, messages must still be encrypted using the proper receiver key.

Case (a) on the journalist side is easy to detect and mitigate. Since journalists can keep a state, they can easily store identifiers, time serialization, or whatever could be needed in order to properly detect message replay. On the source side we cannot do so but we can argue the risk is acceptable: we must timestamp messages so that a source can know its freshness. Besides that, source are by design entities that will do short-lived conversations and will eventually disappear. A source, during its "lifetime" will receive very few messages, compared to journalists who are long-lived and will maintain their identity for eventually years and multiple conversations. Thus a source replay attack, for a source who is expected to receive on average a handful of messages should be quite easy to spot, in addition to the timestamp information. Of course an attacker would not know the content of such messages, so the exploitability options are extremely limited.

In case (b), forging a message requires knowing the encryption public-key of the receiver (the fetching party). Source keys are never advertised anywhere, except they are attached to the first source to journalist contact message (both encryption and fetching public-keys). If an attacker got to know the encryption public key for a source they could perform this attack (if journalist to source messages have not a form of authentication, such as a signature). However, if they got to know the encryption public-key of a source by leaking a plaintext of a source to journalist contact message, they are likely to know both the encryption and fetching public-keys and thus could just build a message tuple from scratch including the fetching "challenge". On the opposite direction, forging messages to journalists is trivial as their keys are advertised publicly. However, we can argue that forging a first contact message is equivalent to doing a new submission, and thus, not an attack scenario.

Here is a matrix of the cases:

To Journalist To Source
Message replay (a1) Journalists are long-lived have a stored state and can detect replays (a2) Sources are short-lived (few replies) and messages are timestamped. We can expect a source to be able to detect a message replay. Attacker does not know the content of the replayed messages. Attacker does not know the messaging volume of any party.
Message forging (b1) Journalists keys are publicly advertised. Forging a message is the same procedure as doing a submission. (b2) Message forging requires knowledge of the source public-key, which is not advertised and only attached in the encrypted submission envelope. If a plaintext with such public-key is leaked, we can assume the fetching public-key is leaked too and thus a complete message tuple can be forged and submitted.

In the case (b2), message forging to a source, we know that a source is expected to be messaged back only by journalists (although the protocol technically allows any to any communication https://github.com/freedomofpress/securedrop-poc/issues/14). We can thus look to authenticate those messages, either via signatures or via implicit authentication.

lsd-cat commented 10 months ago

Disclaimer: we do know that trials and judges do not work as cryptography literature judges. There is plenty of research and surveys on that, and screenshots are usually already accepted as reliable evidences. However, we still value the importance of keeping deniability an important piece: we cannot reduce what is considered evidence but for sure we do not want to add to it. In addition, literature suggests that a judge is less likely to believe evidences when it is clearly showed that the forging process is trivial and accessible. Our stake is then that, whenever we implement the architecture described here, we also publish easy tools for the trivial forging of the deniable components.

The matter of deniability, given the asymmetry of the protocol, is quite complicated compared to for instance Signal and we assume here the concept of deniability as something that is cryptographically repudiable, logically repudiable, or which existence cannot be proven. We understand that deniability, in the context of this cryptographic scheme, has application only after the compromise of certain keys: we already assume that any observer of the server or the network does not learn anything about the content of the message or the communicating parties.

S stands for Source and J stands for Journalist. PT stands for Plaintext. SN stands for Server/Network access at a specific point in time. SNH stands for Server/Network historical access, such as a recording of all the traffic that has been transmitted since the starting of the service.

Cases without plaintext access

SN SNH S PW J keys J-received PT S-received PT Case Deniable
Attacker knows the source passphrase. Attacker can try to use the public interface with the source passphrase to fetch messages. (1d) Attacker must not be able to prove the source existence unless there are pending messages to be fetched and delivered. (anti-user-enumeration)
Attacker has a snapshot of the current server state and knows the source passphrase. (2d) Attacker must not be able to prove the source existence unless there are pending messages to be fetched and delivered.
Attacker has the historical data of everything that has been through the server and knows the source passphrase. (3d) Attacker is able to prove that a source exists and decrypt their messages, only if a journalist ever replied to that source. Otherwise, if the source just sent messages that have never been replied to, it must be impossible to prove the source existence.
Attacker has access to everything. (4d) The single case of source deniability we can have is if the source has submitted a message without attaching their keys or any information that proves knowledge of their public or private keys, and as a consequence they also never received a reply.

Case (1d) is probably the most probable to happen in the real word as a first compromise. The source gets raided, or the passphrase is found after or during the leak.

Case (2d) is still likely given a cloud deployment: seizing a snapshot of a virtual server (including RAM), is easy, cheap and quick.

Case (3d) is already unlikely: not all provider will have consistent snapshots from the beginning of time and they might not be part of a warrant anyway.

Case (4d) is total system failure, meaning all secrets, including ephemeral ones have been leaked and all the history has been recorded. No scheme can be resistant against this.

Cases with partial or total plaintext access

As mentioned at the beginning, let's take X3DH as an example of a state-of-the-art deniable key exchange mechanism which is implicitly authenticated. In summary, the implicit authentication is built into the multiple Diffie-Hellman construction including both ephemeral keys for forward secrecy and identity keys for authentication. The deniability is provable because messages are never signed, and the implicit authentication only holds the cryptographic proofs that messages have been forged or sent by one of the two parties.

In practice, if we take Alice and Bob, both knows that the messages they received can either be forged by themselves or sent by the other party. Since we assume parties are honest with themselves (ie: they know if they have forged a message) they know any received message has originally been sent by the other party. However, a judge does not know if the party presenting the evidence has been honest, and thus does not know whether Alice forged the messages claimed to have been sent by Bob themself or if Bob really has sent them. This is doable Alice and Bob's public identities and keys are advertised. Thus, to forge a message from Bob, Alice can just fetch Bob's keys from Signal servers. Also deniability does not apply in the case of Signal server compromise: the X3DH network traffic clearly shows that Alice and Bob have done a key exchange.

I do not think we can reach any similar result with our protocol here because, except for the case when the source send messages without attaching their public keys and thus never share their identity with anyone, the knowledge itself of such identity is logically non-deniable with plaintext disclosures, at least for what concerns first messages.

SN SNH S PW J keys J-received PT S-received PT Case Deniable
Attacker knows the source passphrase and attacker has access to the plaintext of one or more messages sent by that source to a journalist. (5d) If the plaintext message is the first source message and it has the source's public-keys attached it is logically non deniable: only the source could have sent it because only the source has knowledge of its own public keys since they are never advertised. Furthermore, authentication does not matter: the source is unknown prior to this first message.

For all the messages that are sequential to the first, now the identity of the source is known to the journalist and it could be possible to build an X3DH like scheme. However, when fetching, the source does not know from which journalist a reply is coming from (or should come, as we assume all of them trusted), thus in calculating their shared key for X3DH they would probably need to do it for all enrolled journalists and see which one works for decryption.

Any further reply from a source, could have a concept of authentication, such as authenticating that specific returning source and linking it to their first message. Again, we could do an X3DH-like scheme, but the journalists would not know from which source that specific ciphertext is coming from, and thus they would have to attempt the key agreement with all previously know sources (multiplied by all the possible/available ephemeral keys) bringing up possibly quadratically the decryption complexity. Also the journalist would not know if a message is a first contact message or an X3DH message. Would the X3DH here hold a useful deniability property? Yes(ish) on the single message, but the source contact would still be logically undeniable, as the journalists must have learned the source identity (=their public keys) from the source itself.

My question here, does the complexity of X3DH for replies brings real benefits to the protocol? If the purpose is to identify a returning source with a good degree of certainty, it could be simpler to just add an information that only the original source can know, such as a public key again. Is there something I am missing here?

One thing the source could do to make case (5d) deniable is to attach their passphrase (or an intermediate format of it, such as the KDF prior to key generation, and maybe just the intermediate form for encryption for example) to the encrypted message, giving the ability to the journalist to forge that and eventually subsequent messages. Given that the only intended recipients of any source are the journalists themselves, this should not compromise the protocol scheme. Can this make sense?

Finally, what is left out from this writeup is Journalists deniability: do we care about it?

lsd-cat commented 10 months ago

I am leaning toward the following conclusions:

Then we would have an extra property:

@rocodes mentioned that this is sometimes a useful case: not all sources want to remain anonymous and there might be instances in which both parties are interested in proving the authenticity of the exchange to the world.

So in the end, I think we cannot have participation repudiation but we can have message repudiation. There is to note that if we leak the keys to the journalist, even if a source claim their identity afterwards, a judge could still not know whether the messages following the first contact are source- or journalist-generated since both parties have the cryptographic keys to produce them. What a source can claim is the first contact message, that is by definition non repudiable.

Now a follow up question is: do we consider the first message an handshake and thus we send that in the 'background' and do we then send the actual source provided message content as a second message so that any information beyond the handshake is repudiable? It gets complicated because it influences heavily then request pattern of conversations (such as, a source wouls always send 2 messages per journalist instead of one, unless we elaborate a scheme where we reuse attachments).

lsd-cat commented 10 months ago

An update on this after multiple conversations:

@roeslpa realized that if we do X3DH to a group of journalists individually (such as an X3DH exchange with all of them in pairs) and one than journalists gets compromised and the plaintexts leaked, than deniability is not very much credible, as forging the same message for multiple parties would have required all the parties cooperation

@mmaker suggested thinking another way to achieve authenticity yet message repudiation maybe via ring signatures. On further researching this:

A big discussion point when talking to @cfm is also the following: do we need strong crypto message authentication or we can be satisfied with a downsized notion message authentication that we consider sufficient in the real world?

If a source, when they send a message attach their own public key every time, then any journalists know that such knowledge can either originate from the source or from any other journalist who has previously received a message from the same source. Since when a message is encrypted to a journalists the various fields are bound together by the encryption scheme (we imply an AEAD scheme) we logically assume the only subject that could have built an AEAD ciphertext with both the secret message and the public key is either the source itself or any single random journalist. This probably a weaker notion of authentication, but is there any specific reason or counter example to require more?

If we go this way, we might consider a different scheme for message repudiation (because participation repudiation in general is non achievable). If a source sends a message without any timestamp, and we do not differentiate between the first message and the subsequent ones, assuming journalists have the ability to delete received plaintexts, then potentially any message is repudiable because it could have been forged by anyone who received the source public key. The notion is still somewhat weak because journalists would need all to delete the same messages or similar for it to hold, but I do not think we can achieve any more than this.

The ring signature scheme sounds a bit more modern and advanced, but I am curious to hear people opinion if they deem that notion of authenticity necessary and what kind of problems/insufficient guerantees can bring just attaching the public key instead.

lsd-cat commented 2 months ago

On the opposite direction, we might want signing for Journalists. There has been some internal discussions about the risks associated to doing sign-then-encrypt rather then encrypted-then-sign. Here is a summary:

Why is encrypt-then-sign is reccomended: